[
https://issues.apache.org/jira/browse/HUDI-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922808#comment-17922808
]
Y Ethan Guo commented on HUDI-8896:
-----------------------------------
A few tests in [https://github.com/apache/hudi/pull/12490] failed because of
this, as the compaction directly using the file group reader on reading
bootstrapped file slice does not write the partition column value properly.
> Support reading partition columns for bootstrapped table in the file group
> reader
> ---------------------------------------------------------------------------------
>
> Key: HUDI-8896
> URL: https://issues.apache.org/jira/browse/HUDI-8896
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Y Ethan Guo
> Priority: Critical
> Fix For: 1.0.1
>
> Original Estimate: 16h
> Remaining Estimate: 16h
>
> Right now reading the partition columns of bootstrapped table in Spark works
> at the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group
> reader layer. Specifically, when directly using file group reader to read a
> file slice by merging bootstrap data and skeleton files, the partition column
> values are null; only for the record keys with updates in the log records
> (where the partition columns are read out directly through Hudi log reader),
> the partition column values are correct.
> Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional
> logic of projection to append the partition values on top of the record
> iterator returned by the file group reader:
> {code:java}
> // Append partition values to rows and project to output schema
> appendPartitionAndProject(
> reader.getClosableIterator,
> requestedSchema,
> remainingPartitionSchema,
> outputSchema,
> fileSliceMapping.getPartitionValues,
> fixedPartitionIndexes) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)