[jira] [Created] (HUDI-8896) Support reading partition columns for bootstrapped table in the file group reader

Y Ethan Guo (Jira) Tue, 21 Jan 2025 21:27:14 -0800

Y Ethan Guo created HUDI-8896:
---------------------------------

             Summary: Support reading partition columns for bootstrapped table 
in the file group reader
                 Key: HUDI-8896
                 URL: https://issues.apache.org/jira/browse/HUDI-8896
             Project: Apache Hudi
          Issue Type: Sub-task
            Reporter: Y Ethan Guo
             Fix For: 1.0.1



Right now reading the partition columns of bootstrapped table in Spark works at 
the HoodieFileGroupReaderBasedParquetFileFormat, not at the file group reader 
layer.  Specifically, when directly using file group reader to read a file 
slice by merging bootstrap data and skeleton files, the partition column values 
are null; only for the record keys with updates in the log records (where the 
partition columns are read out directly through Hudi log reader), the partition 
column values are correct.

Currently, HoodieFileGroupReaderBasedParquetFileFormat has an additional logic 
of projection to append the partition values on top of the record iterator 
returned by the file group reader:
{code:java}
// Append partition values to rows and project to output schema
              appendPartitionAndProject(
                reader.getClosableIterator,
                requestedSchema,
                remainingPartitionSchema,
                outputSchema,
                fileSliceMapping.getPartitionValues,
                fixedPartitionIndexes) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-8896) Support reading partition columns for bootstrapped table in the file group reader

Reply via email to