nbalajee opened a new pull request #1077: [HUDI-335] : Improvements to 
DiskbasedMap
URL: https://github.com/apache/incubator-hudi/pull/1077
 
 
   ## What is the purpose of the pull request
   DiskBasedMap is used by ExternalSpillableMap for writing (K,V) pair to a 
file,
     keeping the (K, fileMetadata) in memory, to reduce the foot print of the 
record on disk.
   
     This change improves the performance of the record get/read (random 
read/sequential read) and put/write operations from/to disk, by introducing a 
data buffer/cache.
   
     Before the performance improvement:
   RecordsHandled:      10000   totalTestTime:  3145    writeTime:      1176    
readTime:       255
   RecordsHandled:      50000   totalTestTime:  5775    writeTime:      4187    
readTime:       1175
   RecordsHandled:      100000  totalTestTime:  10570   writeTime:      7718    
readTime:       2203
   RecordsHandled:      500000  totalTestTime:  59723   writeTime:      45618   
readTime:       11093
   RecordsHandled:      1000000 totalTestTime:  120022  writeTime:      87918   
readTime:       22355
   RecordsHandled:      2000000 totalTestTime:  258627  writeTime:      187185  
readTime:       56431
   
     After the improvement:
   RecordsHandled: 10000 totalTestTime: 1551 writeTime: 531 seqReadTime: 122 
randReadTime: 125
   RecordsHandled: 50000 totalTestTime: 1371 writeTime: 420 seqReadTime: 179 
randReadTime: 250
   RecordsHandled: 100000 totalTestTime: 1895 writeTime: 535 seqReadTime: 181 
randReadTime: 512
   RecordsHandled: 500000 totalTestTime: 8838 writeTime: 2031 seqReadTime: 1128 
randReadTime: 2580
   RecordsHandled: 1000000 totalTestTime: 16147 writeTime: 4059 seqReadTime: 
1634 randReadTime: 5293
   RecordsHandled: 2000000 totalTestTime: 34090 writeTime: 8337 seqReadTime: 
3163 randReadTime: 10694
   
   
   ## Brief change log
   
   - Using BufferedRandomAccessFile instead of RandomAccessFile, in read path.
   - Using BufferedOutputStream in the write path. 
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as 
   TestDiskBasedMap:testSimpleInsert
   
   ## Committer checklist
   
    - [x ] Has a corresponding JIRA in PR title & commit .  
   https://issues.apache.org/jira/browse/HUDI-335
       
    - [x ] Commit message is descriptive of the change
    
    - [ x] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to