[PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

via GitHub Tue, 02 Apr 2024 10:14:07 -0700


jonvex opened a new pull request, #10954:
URL: https://github.com/apache/hudi/pull/10954


   ### Change Logs
   Subtask of https://issues.apache.org/jira/browse/HUDI-7045
   Extracts from https://github.com/apache/hudi/pull/10278
   
   Spark parquet readers are created per partition. We want to create a reader 
for each file. This pr ports over the spark readers for each version and 
removes the partition iterator.
   
   To verify the ported code, I have listed the ported spark version in the 
javadoc for readParquetFile
   You can use the following link and switch between tags to see the code for 
that spark version
   
https://github.com/apache/spark/blob/v2.4.8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
   
   ### Impact
   
   Subtask for schema evolution support in new fg reader
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

Reply via email to