[PR] [HUDI-8621] Revert single file slice optimisation for getRecordsByKeys in MDT table [hudi]

via GitHub Wed, 15 Jan 2025 03:22:45 -0800


lokeshj1703 opened a new pull request, #12643:
URL: https://github.com/apache/hudi/pull/12643


   ### Change Logs
   
   In https://github.com/apache/hudi/pull/12376 - we attempted to revert the 
optimization for single file slice, and do the computation such as 
getRecordByKeys, etc. over executors even if it is for a single file slice. 
This means when listing files using metadata files index, even if the data 
partition has only one file slice, it happens over the executor and the request 
is sent to the timeline server (RemoteFileSystemView). However, we noticed that 
the timeline server did not respond and the request timed out in the case of 
bootstrap of a MOR table having multiple partition fields.
   
   The PR reverts the single file slice optimisation and also fixes the test 
failure in TestBootstrapRead.testBootstrapFunctional. The test was failing 
because all spark threads are used up in the list calls for the partition.
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [HUDI-8621] Revert single file slice optimisation for getRecordsByKeys in MDT table [hudi]

Reply via email to