[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

via GitHub Tue, 24 Jan 2023 12:40:09 -0800


alexeykudinkin commented on code in PR #7461:
URL: https://github.com/apache/hudi/pull/7461#discussion_r1085898072



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -123,16 +123,10 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
         Configuration bootstrapFileConfig = new 
Configuration(table.getHadoopConf());
         bootstrapFileReader =
             
HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(bootstrapFileConfig,
 bootstrapFilePath);
-        // NOTE: It's important for us to rely on writer's schema here
-        //         - When records will be read by Parquet reader, if schema 
will be decoded from the
-        //         file itself by taking its Parquet one and converting it to 
Avro. This will be problematic
-        //         w/ schema validations of the records since Avro's schemas 
also validate corresponding
-        //         qualified names of the structs, which could not be 
reconstructed when converting from
-        //         Parquet to Avro (b/c Parquet doesn't bear these)
-        Schema bootstrapSchema = mergeHandle.getWriterSchema();
+
         recordIterator = new MergingIterator<>(
             baseFileRecordIterator,
-            bootstrapFileReader.getRecordIterator(bootstrapSchema),
+            bootstrapFileReader.getRecordIterator(),

Review Comment:
   We actually can't use writer-schema since it might be different from the 
schema the file was actually written with (in respect to nullability for ex) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

Reply via email to