[jira] [Commented] (HBASE-14921) Memory optimizations

ramkrishna.s.vasudevan (JIRA) Wed, 20 Jul 2016 03:32:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385649#comment-15385649
 ]


ramkrishna.s.vasudevan commented on HBASE-14921:
------------------------------------------------

[~anastas]
Thanks for your consolidated feedback and thoughts. I really appreciate your 
inputs and I totally agree with your above points but for
bq. Only in very rare cases, it is known ahead of time that keys never repeat 
or being deleted.
Am not very sure on this. You mean most of the cases will have duplicates? 
There are use cases we have seen where there is not much duplicates and each 
row is unique. Say in a time based row key impl. 
bq.Whether there are duplicates or not, you are going to do the same minor 
compaction anyway on the disk  just to reduce the number of files.
YEs minor compaction on the disk is a bottleneck because of IO. But in the case 
where you have very less duplicates you are doing that operation twice, once in 
memory and once in disk. This patch is not going to say that since memory 
compaction has been done avoid disk minor compaction.
Coming to deletes, there are use cases where the deletes are there but very 
rare. So even when the in memory compaction is going to remove such deletes ( 
if it is encountered) that is going to create a flush which is going to be 
slighly lesser in size but again the minor compaction will be performed on this 
file also. 
bq.At least you should do a homework and present me the clear cut performance 
evidences that the in-memory-compaction of flattened segments is not effective 
in the average case ("default case" as you call it).
The onus is on me for doing this and coming up with results. 
For now we have done the following things
-> After the first version of compacting memstore went in we started testing it 
and found some issues with which we thought pipeline creation was costly. LAter 
the bugs that were solved helped in overcoming this part and we got a better 
result.
-> With only flattening 'ON' by default and with offheap memstore we were able 
to clearly see a better G1GC mixed GC avg time. It got reduced from 0.6 sec to 
0.2 secs. So we are 100% sure flattening is needed. Remember in this impl the 
entire pipeline segments are flushed.
-> For your point of seeing whether default case with inmemory compaction and 
flattening if it is impacting or not, I can get you the numbers. If it is not 
impacting the perf then we can definitely go with your design and no problem 
with that. 
[[email protected]] and [~anoop.hbase] can comment more on the use case part 
and if anything is being missed here. 



> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, 
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, 
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch, 
> HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch, 
> InitialCellArrayMapEvaluation.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14921) Memory optimizations

Reply via email to