hudi-bot opened a new issue, #17366:
URL: https://github.com/apache/hudi/issues/17366
Right now reading the partition columns of bootstrapped table in Spark works
at the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group
reader layer. Specifically, when directly using file group reader to read a
file slice by merging bootstrap data and skeleton files, the partition column
values are null; only for the record keys with updates in the log records
(where the partition columns are read out directly through Hudi log reader),
the partition column values are correct.
Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional
logic of projection to append the partition values on top of the record
iterator returned by the file group reader:
{code:java}
// Append partition values to rows and project to output schema
appendPartitionAndProject(
reader.getClosableIterator,
requestedSchema,
remainingPartitionSchema,
outputSchema,
fileSliceMapping.getPartitionValues,
fixedPartitionIndexes) {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8896
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9108
- Fix version(s):
- 1.1.0
---
## Comments
31/Jan/25 18:17;yihua;A few tests in
[https://github.com/apache/hudi/pull/12490] failed because of this, as the
compaction directly using the file group reader on reading bootstrapped file
slice does not write the partition column value properly.;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]