[
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385649#comment-15385649
]
ramkrishna.s.vasudevan commented on HBASE-14921:
------------------------------------------------
[~anastas]
Thanks for your consolidated feedback and thoughts. I really appreciate your
inputs and I totally agree with your above points but for
bq. Only in very rare cases, it is known ahead of time that keys never repeat
or being deleted.
Am not very sure on this. You mean most of the cases will have duplicates?
There are use cases we have seen where there is not much duplicates and each
row is unique. Say in a time based row key impl.
bq.Whether there are duplicates or not, you are going to do the same minor
compaction anyway on the disk just to reduce the number of files.
YEs minor compaction on the disk is a bottleneck because of IO. But in the case
where you have very less duplicates you are doing that operation twice, once in
memory and once in disk. This patch is not going to say that since memory
compaction has been done avoid disk minor compaction.
Coming to deletes, there are use cases where the deletes are there but very
rare. So even when the in memory compaction is going to remove such deletes (
if it is encountered) that is going to create a flush which is going to be
slighly lesser in size but again the minor compaction will be performed on this
file also.
bq.At least you should do a homework and present me the clear cut performance
evidences that the in-memory-compaction of flattened segments is not effective
in the average case ("default case" as you call it).
The onus is on me for doing this and coming up with results.
For now we have done the following things
-> After the first version of compacting memstore went in we started testing it
and found some issues with which we thought pipeline creation was costly. LAter
the bugs that were solved helped in overcoming this part and we got a better
result.
-> With only flattening 'ON' by default and with offheap memstore we were able
to clearly see a better G1GC mixed GC avg time. It got reduced from 0.6 sec to
0.2 secs. So we are 100% sure flattening is needed. Remember in this impl the
entire pipeline segments are flushed.
-> For your point of seeing whether default case with inmemory compaction and
flattening if it is impacting or not, I can get you the numbers. If it is not
impacting the perf then we can definitely go with your design and no problem
with that.
[[email protected]] and [~anoop.hbase] can comment more on the use case part
and if anything is being missed here.
> Memory optimizations
> --------------------
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: Eshcar Hillel
> Assignee: Anastasia Braginsky
> Attachments: CellBlocksSegmentInMemStore.pdf,
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch,
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch,
> HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch,
> InitialCellArrayMapEvaluation.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap
> allocations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)