Re: [PR] [HUDI-9316] Add Avro based ReaderContext to assist in migration to FileGroupReader [hudi]

via GitHub Mon, 21 Apr 2025 17:20:19 -0700


the-other-tim-brown commented on code in PR #13171:
URL: https://github.com/apache/hudi/pull/13171#discussion_r2053139220



##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -141,13 +186,13 @@ public ClosableIterator<IndexedRecord> 
mergeBootstrapReaders(ClosableIterator<In
                                                                Schema 
skeletonRequiredSchema,
                                                                
ClosableIterator<IndexedRecord> dataFileIterator,
                                                                Schema 
dataRequiredSchema) {
-    return null;
+    return new BootstrapIterator(skeletonFileIterator, skeletonRequiredSchema, 
dataFileIterator, dataRequiredSchema);
   }
 
   @Override
   public UnaryOperator<IndexedRecord> projectRecord(Schema from, Schema to, 
Map<String, String> renamedColumns) {
     if (!renamedColumns.isEmpty()) {
-      throw new UnsupportedOperationException("Schema evolution is not 
supported for the test reader context");
+      throw new UnsupportedOperationException("Schema evolution is not 
supported for the HoodieAvroReaderContext");
     }
     Map<String, Integer> fromFields = IntStream.range(0, 
from.getFields().size())

Review Comment:
   +1 to building the transform. There are other places we can do this as well 
that I have found while digging into this code path.



##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -198,4 +243,57 @@ private Object getFieldValueFromIndexedRecord(
     int pos = field.pos();
     return record.get(pos);
   }
+
+  /**
+   * Iterator that traverses the skeleton file and the base file in tandem.
+   * The iterator will only extract the fields requested in the provided 
schemas.
+   */
+  private static class BootstrapIterator implements 
ClosableIterator<IndexedRecord> {
+    private final ClosableIterator<IndexedRecord> skeletonFileIterator;
+    private final Schema skeletonRequiredSchema;
+    private final ClosableIterator<IndexedRecord> dataFileIterator;
+    private final Schema dataRequiredSchema;
+    private final Schema mergedSchema;
+    private final int skeletonFields;
+
+    public BootstrapIterator(ClosableIterator<IndexedRecord> 
skeletonFileIterator, Schema skeletonRequiredSchema,
+                             ClosableIterator<IndexedRecord> dataFileIterator, 
Schema dataRequiredSchema) {
+      this.skeletonFileIterator = skeletonFileIterator;
+      this.skeletonRequiredSchema = skeletonRequiredSchema;
+      this.dataFileIterator = dataFileIterator;
+      this.dataRequiredSchema = dataRequiredSchema;
+      this.mergedSchema = AvroSchemaUtils.mergeSchemas(skeletonRequiredSchema, 
dataRequiredSchema);
+      this.skeletonFields = skeletonRequiredSchema.getFields().size();
+    }
+
+    @Override
+    public void close() {
+      skeletonFileIterator.close();
+      dataFileIterator.close();
+    }
+
+    @Override
+    public boolean hasNext() {
+      checkState(dataFileIterator.hasNext() == skeletonFileIterator.hasNext(),
+          "Bootstrap data-file iterator and skeleton-file iterator have to be 
in-sync!");
+      return skeletonFileIterator.hasNext();

Review Comment:
   Yes, that is generally a requirement of an iterator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9316] Add Avro based ReaderContext to assist in migration to FileGroupReader [hudi]

Reply via email to