[PR] [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format [hudi]

via GitHub Wed, 08 Nov 2023 12:34:11 -0800


yihua opened a new pull request, #10020:
URL: https://github.com/apache/hudi/pull/10020


   ### Change Logs
   
   This PR adds the support of reading only log files in the file group 
reader-based Spark parquet file format 
(`HoodieFileGroupReaderBasedParquetFileFormat`).
   - In 
`HoodieFileGroupReaderBasedParquetFileFormat#buildReaderWithPartitionValues`, 
the record iterator from the new file group reader is returned when there are 
only log files in a file group.
   - Fixes the log file iterator to properly project the data based on the 
required / reader schema in `SparkFileFormatInternalRowReaderContext`.
   - Adds new tests on read log files only in `TestHoodieFileGroupReaderBase`.
   
   ### Impact
   
   As above, improves functionality.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format [hudi]

Reply via email to