Y Ethan Guo created HUDI-8896:
---------------------------------
Summary: Support reading partition columns for bootstrapped table
in the file group reader
Key: HUDI-8896
URL: https://issues.apache.org/jira/browse/HUDI-8896
Project: Apache Hudi
Issue Type: Sub-task
Reporter: Y Ethan Guo
Fix For: 1.0.1
Right now reading the partition columns of bootstrapped table in Spark works at
the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group reader
layer. Specifically, when directly using file group reader to read a file
slice by merging bootstrap data and skeleton files, the partition column values
are null; only for the record keys with updates in the log records (where the
partition columns are read out directly through Hudi log reader), the partition
column values are correct.
Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional logic
of projection to append the partition values on top of the record iterator
returned by the file group reader:
{code:java}
// Append partition values to rows and project to output schema
appendPartitionAndProject(
reader.getClosableIterator,
requestedSchema,
remainingPartitionSchema,
outputSchema,
fileSliceMapping.getPartitionValues,
fixedPartitionIndexes) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)