the-other-tim-brown commented on code in PR #13223:
URL: https://github.com/apache/hudi/pull/13223#discussion_r2069198471


##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -228,8 +234,27 @@ private ClosableIterator<T> 
makeBootstrapBaseFileIterator(HoodieBaseFile baseFil
       if (start != 0) {
         throw new IllegalArgumentException("Filegroup reader is doing 
bootstrap merge but we are not reading from the start of the base file");
       }
+      PartitionPathParser partitionPathParser = new PartitionPathParser();
+      Object[] partitionValues = 
partitionPathParser.getPartitionFieldVals(partitionPathFields, partitionPath, 
readerContext.getSchemaHandler().getTableSchema());
+      // filter out the partition values that are not required by the data 
schema
+      Object[] filteredPartitionValues = new Object[0];
+      Option<String[]> filteredPartitionPathFields = Option.empty();
+      if (partitionPathFields.isPresent()) {
+        Schema dataSchema = dataFileIterator.get().getRight();
+        List<String> fields = new ArrayList<>();
+        List<Object> values = new ArrayList<>();
+        for (int i = 0; i < partitionPathFields.get().length; i++) {
+          String field = partitionPathFields.get()[i];
+          if (dataSchema.getField(field) != null) {
+            fields.add(field);
+            values.add(partitionValues[i]);
+          }
+        }
+        filteredPartitionPathFields = fields.isEmpty() ? Option.empty() : 
Option.of(fields.toArray(new String[0]));
+        filteredPartitionValues = values.toArray(new Object[0]);
+      }

Review Comment:
   The parser requires the table schema because it needs to know how to parse 
the fields for a given type. For example, a date can be multiple folders in the 
path



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to