ramkrish86 opened a new pull request #2747:
URL: https://github.com/apache/hbase/pull/2747


   Closed the original PR due to some issues with my linux/windows environment 
toggling. Created a new PR which can compile too. 
   This version of the patch tries to introduce an interface 
ContiguousCellFormat which understands the KV format where the data is arranged 
in the KV serialization format.
   It tries to minimize the branching in cases of pure Kv or pure ByteBufferKV. 
with this patch and JMH like test with adding >100MB of data getting added to 
Memstore like CSLM provides >50% improvement where all the cells are pure KVs.
   
   We did some cluster testing with only KV as the cell type and also with no 
DBEs. We might need some more tests to ensure we don't break anything.
   In this commit apart from having the ContiguousCellComparator, We also found 
that the bulk load performance was slower inspite of overall improving the 
comparator performance by above 15%.
   The reason was that PutsortReducer - get a given row with all the cells for 
that row and that gets written to the hfile. So effectively it is one row that 
is geting added to the map. Now even when cases where there are 300 cells in a 
row, the optimization that we expect out of ContiguousCellComparator changes 
does not kick in. That is due to the various branches we still have in the code 
and the number of cells for the optimization to kick in is still lesser.
   For those cases if we can bring up the KVComparator again (currently it is 
deprecated - see the PutsortReducer changes in the patch) and use that 
KVComparator specifically for these bulk load type of cases then we are 
performing 15% faster than 1.3 branch. This is in line with what we are trying 
to do in https://issues.apache.org/jira/browse/HBASE-24754.
   I can open up a discussion thread with all the details in the dev@ for 
others to chime in.
   @anoopsjohn , @saintstack - FYI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to