[jira] [Commented] (HUDI-8896) Support reading partition columns for bootstrapped table in the file group reader

Y Ethan Guo (Jira) Fri, 31 Jan 2025 10:18:10 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922808#comment-17922808
 ]


Y Ethan Guo commented on HUDI-8896:
-----------------------------------

A few tests in [https://github.com/apache/hudi/pull/12490] failed because of 
this, as the compaction directly using the file group reader on reading 
bootstrapped file slice does not write the partition column value properly.

> Support reading partition columns for bootstrapped table in the file group 
> reader
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-8896
>                 URL: https://issues.apache.org/jira/browse/HUDI-8896
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Y Ethan Guo
>            Priority: Critical
>             Fix For: 1.0.1
>
>   Original Estimate: 16h
>  Remaining Estimate: 16h
>
> Right now reading the partition columns of bootstrapped table in Spark works 
> at the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group 
> reader layer.  Specifically, when directly using file group reader to read a 
> file slice by merging bootstrap data and skeleton files, the partition column 
> values are null; only for the record keys with updates in the log records 
> (where the partition columns are read out directly through Hudi log reader), 
> the partition column values are correct.
> Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional 
> logic of projection to append the partition values on top of the record 
> iterator returned by the file group reader:
> {code:java}
> // Append partition values to rows and project to output schema
>               appendPartitionAndProject(
>                 reader.getClosableIterator,
>                 requestedSchema,
>                 remainingPartitionSchema,
>                 outputSchema,
>                 fileSliceMapping.getPartitionValues,
>                 fixedPartitionIndexes) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8896) Support reading partition columns for bootstrapped table in the file group reader

Reply via email to