[
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385639#comment-15385639
]
Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------
Thank you [~anoop.hbase] and [~ram_krish]! You understand this project so well,
made a thoroughly code review, not to say how much do I appreciate your deep
HBase knowledge and experience!
I am calling to [~stack] to join our interesting discussions!
I understand your point. You would prefer to flatten the segments without
compaction. This is because you believe that the compaction should in average
eliminate little cells and even a scan doing the compaction is costly.
Let me disagree with this point of view and let me explain myself. Here are my
points:
1. Whether compaction is going to eliminate cells or not is unknown till the
run-time. We don't want to add (yet another) user configurable flag: whether to
use compaction or not. Only in very rare cases, it is known ahead of time that
keys never repeat or being deleted.
2. For the deletion case, if a key K was inserted and deleted, then we (at
least) have 2 versions of K were one can be eliminated. So it is not the
compaction is useless for deletions.
3. The performance degradation due to "pre-compaction" scan is yet to be
estimated.
4. Whether there are duplicates or not, you are going to do the same minor
compaction anyway on the disk (!) just to reduce the number of files. And there
it is going to cost you much more, due to write amplifications, HDFS file
replications, networking, bounded number of threads for compaction, I/O
multiplication, etc.
So we can think about some compaction application policies, apply it once in a
while, etc.
But just to disregard the great possibility to compact in-memory... I don't
think it is wise.
At least you should do a homework and present me the clear cut performance
evidences that the in-memory-compaction of flattened segments is not effective
in the average case ("default case" as you call it).
Please note that in-memory-compaction is actually more than saving some RAM
space, especially coming together with flattening.
We are now holding more in the memory and thus having more possibility to let a
cell "die" in memory.
As I explained in Point 4, this saves much more resources then just space in
RAM.
Please convince me where I am wrong :)
> Memory optimizations
> --------------------
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: Eshcar Hillel
> Assignee: Anastasia Braginsky
> Attachments: CellBlocksSegmentInMemStore.pdf,
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch,
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch,
> HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch,
> InitialCellArrayMapEvaluation.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap
> allocations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)