[ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172035#comment-15172035 ]
Anastasia Braginsky commented on HBASE-14921: --------------------------------------------- [~stack] and [~anoop.hbase] big thank you for your great comments! Please see below. bq. On #2, you have heard the rumor that MSLAB may not be needed when running on G1GC (TBD) What makes you think that using G1GC is better than MSLAB? In my understanding G1GC indeed decreases the GC pauses, but it does so using parallel programming and more complex algorithms. So you are going to pay in CPU cycles and in memory. The ad hoc memory management is always at least as good as some universal one. So I believe MSLAB still need to be used (and not only because of off-heap option) even if G1GC is used. bq. So, when you say the MSLAB can be offheap, its ok to have references only in CSLM? We do not want to be copying data across the onheap/offheap boundary if it can be avoided. When MSLAB goes off-heap there is no copying data across the onheap/offheap! Only at the beginning if data comes on-heap and need to be copied down to MSLAB off-heap Chunks. Then at the end when flushing to disk, (as I see it) the HFile Writer still uses on-heap byte stream. So no option, but to copy back from off-heap to on-heap. And about having references only in CSLM, what do you mean? No need in CellBlocks? Or do you want the entire Cell object to be pushed inside ConcurrentSkipListMap? Pay attention that references between off-heap and on-heap are OK (no extra performance cost), just those accesses are going to be performed differently. bq. So, it looks like you are talking of doing at least an extra copy from the original MSLAB to a new Segment MSLAB. Would be cool if a bit of evidence that this extra copy to a Segment, even in the worst case where no purge was possible, cost less than trying to continue with a 'fat' CSLM. You are totally right, it could be good to have some “compaction predictor”, which will indicate how much a compaction is needed. We have some thoughts how it can be done, but it is not a trivial task. In order not to intermix it all together just now, we can add such a predictor later, after we have benchmarking for flat representation and off-heaping. As you can see there is a lot to be done, let us just take the challenges one by one. bq. "The compaction process can repeat until we must flush to disk. " There will be guards in place to prevent our compacting in-memory when it not possible that a compaction can produce a tighter in-memory representation (no purges possible, etc.)? Currently we do not have such “guards” and I understand your concern for unneeded or frequent compaction. For now compaction starts (asynchronously) soon after flush-in-memory and we assume flush-in-memory is infrequent task that “freezes” (makes immutable) a big amount of memory. So assumption is that among big amount of memory you have a higher probability to find something to compact. bq. When will the compaction getting triggered? Time based and/or #ImmutableSegments in the pipeline? bq. So am very interested to know when you consider we can compact the CSLM cells into array. As I have said, currently the compaction is triggered asynchronously after each in-memory-flush, if there is no another on-going compaction. #ImmutableSegments in the pipeline can also be a trigger. Please pay attention that compaction process happens in background (!!!) meaning that none waits for it. It cost you CPU cycles only, and if you lower the priority of the compacting thread even the CPU cycles should not be an issue. So I wouldn’t be worried so much about the time spent on copy in the compaction time… Am I missing something? bq. So for making the array of Cells we need to know how many cells will survive into the compacted result. So we will do scan over the ImmutableSegments 2 times? To know the #cells and then for actual moving it into array. No. We are going to allocate the array of Cells for the worst case - all the cells will survive. Pay attention that Cell reference takes very little. bq. If we know the #cells compacting out and #cells which will get away, we can decide whether it is worth copy to new area or not. This is also a possibility. bq. It is not just 8 bytes extra overhead per cell when we have array of cells instead of plain bytes cellblock (as HFile data block) bq. Ref to cell in array (8 bytes) + Cell Object (16 bytes) + ref to byte with Cell (8) + offset and length ints (8) = 40 bytes per cell. OK. Pay attention that when you have plain bytes cellblock (as HFile data block), in CellBlock as in HBASE-10713 you had a TreeMap overhead on top of plain bytes for the search. So if we are not counting Cell data in the MSLAB and if we have 40 bytes overhead per cell it is still good. In CLSM you have 4x40 = 160 bytes overhead per Cell (again not counting Cell data in MSLAB which can be 1KB). > Memory optimizations > -------------------- > > Key: HBASE-14921 > URL: https://issues.apache.org/jira/browse/HBASE-14921 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Eshcar Hillel > Assignee: Anastasia Braginsky > Attachments: CellBlocksSegmentInMemStore.pdf > > > Memory optimizations including compressed format representation and offheap > allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)