[ 
https://issues.apache.org/jira/browse/HUDI-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-8896:
--------------------------
    Status: In Progress  (was: Open)

> Support reading partition columns for bootstrapped table in the file group 
> reader
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-8896
>                 URL: https://issues.apache.org/jira/browse/HUDI-8896
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Y Ethan Guo
>            Assignee: Lin Liu
>            Priority: Critical
>             Fix For: 1.0.1
>
>   Original Estimate: 16h
>  Remaining Estimate: 16h
>
> Right now reading the partition columns of bootstrapped table in Spark works 
> at the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group 
> reader layer.  Specifically, when directly using file group reader to read a 
> file slice by merging bootstrap data and skeleton files, the partition column 
> values are null; only for the record keys with updates in the log records 
> (where the partition columns are read out directly through Hudi log reader), 
> the partition column values are correct.
> Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional 
> logic of projection to append the partition values on top of the record 
> iterator returned by the file group reader:
> {code:java}
> // Append partition values to rows and project to output schema
>               appendPartitionAndProject(
>                 reader.getClosableIterator,
>                 requestedSchema,
>                 remainingPartitionSchema,
>                 outputSchema,
>                 fileSliceMapping.getPartitionValues,
>                 fixedPartitionIndexes) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to