prashantwason opened a new pull request #2494:
URL: https://github.com/apache/hudi/pull/2494
## What is the purpose of the pull request
Improves the performance of key lookups from Metadata Table.
In my scale testing with 150 partitions and 100K+ files on HDFS, the time to
read the key was reduced (100ms avg -> 10ms) and the total data read from the
HFile was reduced (85MB -> 3MB). The size of the base file was 3MB so this
means that the in-memory HFile block caching was also working.
## Brief change log
1. Cache the KeyScanner across lookups so that the HFile index does not have
to be read for each lookup.
2. Enable block caching in KeyScanner.
3. Move the lock to a limited scope of the code to reduce lock contention.
## Verify this pull request
This pull request is already covered by existing tests, such as *(please
describe tests)*.
mvn test -pl hudi-client/hudi-spark-client -Dtest=TestHoodieBackedMetadata
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]