prashantwason opened a new pull request #2494:
URL: https://github.com/apache/hudi/pull/2494


   
   ## What is the purpose of the pull request
   
   Improves the performance of key lookups from Metadata Table. 
   
   In my scale testing with 150 partitions and 100K+ files on HDFS, the time to 
read the key was reduced (100ms avg -> 10ms) and the total data read from the 
HFile was reduced (85MB -> 3MB). The size of the base file was 3MB so this 
means that the in-memory HFile block caching was also working. 
   
   ## Brief change log
   
   1. Cache the KeyScanner across lookups so that the HFile index does not have 
to be read for each lookup.
   2. Enable block caching in KeyScanner.
   3. Move the lock to a limited scope of the code to reduce lock contention.
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   mvn test -pl  hudi-client/hudi-spark-client -Dtest=TestHoodieBackedMetadata
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to