[
https://issues.apache.org/jira/browse/HUDI-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lin Liu updated HUDI-8896:
--------------------------
Status: In Progress (was: Open)
> Support reading partition columns for bootstrapped table in the file group
> reader
> ---------------------------------------------------------------------------------
>
> Key: HUDI-8896
> URL: https://issues.apache.org/jira/browse/HUDI-8896
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Y Ethan Guo
> Assignee: Lin Liu
> Priority: Critical
> Fix For: 1.0.1
>
> Original Estimate: 16h
> Remaining Estimate: 16h
>
> Right now reading the partition columns of bootstrapped table in Spark works
> at the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group
> reader layer. Specifically, when directly using file group reader to read a
> file slice by merging bootstrap data and skeleton files, the partition column
> values are null; only for the record keys with updates in the log records
> (where the partition columns are read out directly through Hudi log reader),
> the partition column values are correct.
> Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional
> logic of projection to append the partition values on top of the record
> iterator returned by the file group reader:
> {code:java}
> // Append partition values to rows and project to output schema
> appendPartitionAndProject(
> reader.getClosableIterator,
> requestedSchema,
> remainingPartitionSchema,
> outputSchema,
> fileSliceMapping.getPartitionValues,
> fixedPartitionIndexes) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)