yihua commented on code in PR #13223:
URL: https://github.com/apache/hudi/pull/13223#discussion_r2065546587


##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -228,8 +234,27 @@ private ClosableIterator<T> 
makeBootstrapBaseFileIterator(HoodieBaseFile baseFil
       if (start != 0) {
         throw new IllegalArgumentException("Filegroup reader is doing 
bootstrap merge but we are not reading from the start of the base file");
       }
+      PartitionPathParser partitionPathParser = new PartitionPathParser();
+      Object[] partitionValues = 
partitionPathParser.getPartitionFieldVals(partitionPathFields, partitionPath, 
readerContext.getSchemaHandler().getTableSchema());
+      // filter out the partition values that are not required by the data 
schema
+      Object[] filteredPartitionValues = new Object[0];
+      Option<String[]> filteredPartitionPathFields = Option.empty();
+      if (partitionPathFields.isPresent()) {
+        Schema dataSchema = dataFileIterator.get().getRight();
+        List<String> fields = new ArrayList<>();
+        List<Object> values = new ArrayList<>();
+        for (int i = 0; i < partitionPathFields.get().length; i++) {
+          String field = partitionPathFields.get()[i];
+          if (dataSchema.getField(field) != null) {

Review Comment:
   To clarify, the `dataSchema` can contain the partition column which can be 
missing in the bootstrap data file on storage, and the records returned by 
`dataFileIterator.get().getLeft()` contain such a field,  correct?



##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -290,6 +300,11 @@ public IndexedRecord next() {
         Schema.Field sourceField = 
dataRecord.getSchema().getField(dataField.name());
         mergedRecord.put(dataField.pos() + skeletonFields, 
dataRecord.get(sourceField.pos()));
       }
+      for (int i = 0; i < partitionFieldPositions.length; i++) {
+        if (mergedRecord.get(i) != null) {

Review Comment:
   Should this be `mergedRecord.get(partitionFieldPositions[i]) != null`?



##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java:
##########
@@ -359,7 +359,9 @@ public Map<String, Object> 
updateSchemaAndResetOrderingValInMetadata(Map<String,
   public abstract ClosableIterator<T> 
mergeBootstrapReaders(ClosableIterator<T> skeletonFileIterator,
                                                             Schema 
skeletonRequiredSchema,
                                                             
ClosableIterator<T> dataFileIterator,
-                                                            Schema 
dataRequiredSchema);
+                                                            Schema 
dataRequiredSchema,
+                                                            Option<String[]> 
partitionFields,
+                                                            Object[] 
partitionValues);

Review Comment:
   nit: use `List<Pair<String, Object>>`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to