nbalajee opened a new pull request #1077: [HUDI-335] : Improvements to DiskbasedMap URL: https://github.com/apache/incubator-hudi/pull/1077 ## What is the purpose of the pull request DiskBasedMap is used by ExternalSpillableMap for writing (K,V) pair to a file, keeping the (K, fileMetadata) in memory, to reduce the foot print of the record on disk. This change improves the performance of the record get/read (random read/sequential read) and put/write operations from/to disk, by introducing a data buffer/cache. Before the performance improvement: RecordsHandled: 10000 totalTestTime: 3145 writeTime: 1176 readTime: 255 RecordsHandled: 50000 totalTestTime: 5775 writeTime: 4187 readTime: 1175 RecordsHandled: 100000 totalTestTime: 10570 writeTime: 7718 readTime: 2203 RecordsHandled: 500000 totalTestTime: 59723 writeTime: 45618 readTime: 11093 RecordsHandled: 1000000 totalTestTime: 120022 writeTime: 87918 readTime: 22355 RecordsHandled: 2000000 totalTestTime: 258627 writeTime: 187185 readTime: 56431 After the improvement: RecordsHandled: 10000 totalTestTime: 1551 writeTime: 531 seqReadTime: 122 randReadTime: 125 RecordsHandled: 50000 totalTestTime: 1371 writeTime: 420 seqReadTime: 179 randReadTime: 250 RecordsHandled: 100000 totalTestTime: 1895 writeTime: 535 seqReadTime: 181 randReadTime: 512 RecordsHandled: 500000 totalTestTime: 8838 writeTime: 2031 seqReadTime: 1128 randReadTime: 2580 RecordsHandled: 1000000 totalTestTime: 16147 writeTime: 4059 seqReadTime: 1634 randReadTime: 5293 RecordsHandled: 2000000 totalTestTime: 34090 writeTime: 8337 seqReadTime: 3163 randReadTime: 10694 ## Brief change log - Using BufferedRandomAccessFile instead of RandomAccessFile, in read path. - Using BufferedOutputStream in the write path. ## Verify this pull request This pull request is already covered by existing tests, such as TestDiskBasedMap:testSimpleInsert ## Committer checklist - [x ] Has a corresponding JIRA in PR title & commit . https://issues.apache.org/jira/browse/HUDI-335 - [x ] Commit message is descriptive of the change - [ x] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
