[ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221626#comment-15221626 ]
Anastasia Braginsky commented on HBASE-14921: --------------------------------------------- bq. But one qusetion why is that the init() method position changed now? When a chunk was got from the pool the init was previously happening after the CAS operation but now it is now moved into the allocateChunk itself? Will it have ramifications? We wanted to concentrate all chunk creations (allocation and initialization) in the ChunkPool, in order to let the ChunkPool manage the mapping to the ID. Previously the Chunk was initiated only just before it is going to be directly used. I see your point [~ram_krish], as it is currently implemented Chunks are going to be initialized (memory allocated) also when just pre-created for the pool. This is not efficient. I’ll fix that. bq. I read the new classes in this patch. So in which patch this is being used ? Or it will come later? Thank you [~anoop.hbase] for taking a look. The CellFlatMap (CellBlock name changed with [~stack] help) is going to be part of ImmutableSegment, so after in-memory-flush, the CSLM should be changed to CellFlatMap. I am currently writing this code and hope to present it soon. Some intuition for the usage can be found in the TestCellBlockSet. bq. We don't need the last int of Cell length. We have the offset to Cell. See constructor - KeyValue(final byte [] bytes, final int offset) This is a very good comment! I didn’t think in that direction, but we can enjoy this “super-compact-representation” :) bq. If we use the Cell[] way, per Cell we have more overhead. Of course, Cell[] is expensive. It was implemented because it is very simple, easy to debug and to compare with plain byte array serialization. But the Cell[] can be useful for very large cells, those bigger then MSLAB Chunks (e.g. > 2MB). If we know we are going to deal with such very large cells and do not want to allocate un-reusable special-size MSLAB Chunks, CellArrayMap is good solution (also new name for CellBlockObjectArray). bq. BTW HBASE-15179, under this we are doing some PoC and test with off heap Memstore. Please pay attention that as part of this jira we change MSLAB and MemStoreChunkPool files. Need to align with your code taking MSLAB off-heap. bq. The 3 ints per cell also written to chunks we get from same MSLAB. We need this really? So if we change to 8 bytes per cell, and when chunk size is 2 MB, we can have 262144 cells. We will have this many really? If not, we may waste that chunk? Excellent discussion, [~anoop.hbase]! Those were my thoughts as well… Initially, I wrote CellBlockSerialized (now called CellChunkMap) as getting byte[] of any size and dealing with it. However, later I thought that this might be needed to be taken off-heap and maybe it is better to centralize all this off-heaping to the Chunks. So if Chunk is off-heap then all what is implemented on top of it is off-heap as well… Now, if we may have just 2 int for Cell representation (2^3 bytes), we may fit 2^21/2^3=2^18 cells in a Chunk of size 2MB. A cell may use 256=2^8 bytes for all its data, which is not too much. Do we often serve Cells with size smaller than that? If so, then one Chunk can represent 2^18*2^8=2^26 bytes = 64MB, which is already half of what we can hold in one MemStore without flushing to disk. >From here, in 99% we will not use Chunk[] and single Chunk is enough. But what if not? What if we have really small cells, like integer for a key and integer for a data? Is it a possible use-case? For such small cells the representation of metadata is actually super-important, as you do not want metadata to be bigger than data… I will continue answering more questions already posted here... > Memory optimizations > -------------------- > > Key: HBASE-14921 > URL: https://issues.apache.org/jira/browse/HBASE-14921 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Eshcar Hillel > Assignee: Anastasia Braginsky > Attachments: CellBlocksSegmentInMemStore.pdf, > CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, > HBASE-14921-V02.patch > > > Memory optimizations including compressed format representation and offheap > allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)