yihua commented on code in PR #13223:
URL: https://github.com/apache/hudi/pull/13223#discussion_r2065546587
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -228,8 +234,27 @@ private ClosableIterator<T>
makeBootstrapBaseFileIterator(HoodieBaseFile baseFil
if (start != 0) {
throw new IllegalArgumentException("Filegroup reader is doing
bootstrap merge but we are not reading from the start of the base file");
}
+ PartitionPathParser partitionPathParser = new PartitionPathParser();
+ Object[] partitionValues =
partitionPathParser.getPartitionFieldVals(partitionPathFields, partitionPath,
readerContext.getSchemaHandler().getTableSchema());
+ // filter out the partition values that are not required by the data
schema
+ Object[] filteredPartitionValues = new Object[0];
+ Option<String[]> filteredPartitionPathFields = Option.empty();
+ if (partitionPathFields.isPresent()) {
+ Schema dataSchema = dataFileIterator.get().getRight();
+ List<String> fields = new ArrayList<>();
+ List<Object> values = new ArrayList<>();
+ for (int i = 0; i < partitionPathFields.get().length; i++) {
+ String field = partitionPathFields.get()[i];
+ if (dataSchema.getField(field) != null) {
Review Comment:
To clarify, the `dataSchema` can contain the partition column which can be
missing in the bootstrap data file on storage, and the records returned by
`dataFileIterator.get().getLeft()` contain such a field, correct?
##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -290,6 +300,11 @@ public IndexedRecord next() {
Schema.Field sourceField =
dataRecord.getSchema().getField(dataField.name());
mergedRecord.put(dataField.pos() + skeletonFields,
dataRecord.get(sourceField.pos()));
}
+ for (int i = 0; i < partitionFieldPositions.length; i++) {
+ if (mergedRecord.get(i) != null) {
Review Comment:
Should this be `mergedRecord.get(partitionFieldPositions[i]) != null`?
##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java:
##########
@@ -359,7 +359,9 @@ public Map<String, Object>
updateSchemaAndResetOrderingValInMetadata(Map<String,
public abstract ClosableIterator<T>
mergeBootstrapReaders(ClosableIterator<T> skeletonFileIterator,
Schema
skeletonRequiredSchema,
ClosableIterator<T> dataFileIterator,
- Schema
dataRequiredSchema);
+ Schema
dataRequiredSchema,
+ Option<String[]>
partitionFields,
+ Object[]
partitionValues);
Review Comment:
nit: use `List<Pair<String, Object>>`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]