[
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385690#comment-15385690
]
Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------
bq. Am not very sure on this. You mean most of the cases will have duplicates?
There are use cases we have seen where there is not much duplicates and each
row is unique. Say in a time based row key impl.
No, I do not mean most of the cases will have duplicates. I am sure there are
cases where are no duplicates at all. I mean for example the cases where there
are period of times with more duplicates and periods with less. When it is not
clearly known ahead of time. Usually, the usecases with no duplicates at all
and with lots of duplicates are rare. I just think that 10-15% of duplicates
should worth compaction...
bq. Yes minor compaction on the disk is a bottleneck because of IO. But in the
case where you have very less duplicates you are doing that operation twice,
once in memory and once in disk. This patch is not going to say that since
memory compaction has been done avoid disk minor compaction. Coming to deletes,
there are use cases where the deletes are there but very rare. So even when the
in memory compaction is going to remove such deletes ( if it is encountered)
that is going to create a flush which is going to be slighly lesser in size but
again the minor compaction will be performed on this file also.
I agree with you that without duplicates in-memory compaction is unnecessary. I
just wanted to show that in case of little duplicates you gain more then space
in memory.
The results are very interesting. On which version exactly was the estimation
done? On my previous patch? Let me give you a new and updated pach today.
Thank you, Ramkrishna!
> Memory optimizations
> --------------------
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: Eshcar Hillel
> Assignee: Anastasia Braginsky
> Attachments: CellBlocksSegmentInMemStore.pdf,
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch,
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch,
> HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch,
> InitialCellArrayMapEvaluation.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap
> allocations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)