[jira] [Commented] (HBASE-14921) Memory optimizations

Anastasia Braginsky (JIRA) Fri, 01 Apr 2016 05:26:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221626#comment-15221626
 ]


Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------

bq. But one qusetion why is that the init() method position changed now? When a 
chunk was got from the pool the init was previously happening after the CAS 
operation but now it is now moved into the allocateChunk itself? Will it have 
ramifications?

We wanted to concentrate all chunk creations (allocation and initialization) in 
the ChunkPool, in order to let the ChunkPool manage the mapping to the ID. 
Previously the Chunk was initiated only just before it is going to be directly 
used. I see your point [~ram_krish], as it is currently implemented Chunks are 
going to be initialized (memory allocated) also when just pre-created for the 
pool. This is not efficient. I’ll fix that.

bq. I read the new classes in this patch. So in which patch this is being used 
? Or it will come later?

Thank you [~anoop.hbase] for taking a look. The CellFlatMap (CellBlock name 
changed with [~stack] help) is going to be part of ImmutableSegment, so after 
in-memory-flush, the CSLM should be changed to CellFlatMap. I am currently 
writing this code and hope to present it soon. Some intuition for the usage can 
be found in the TestCellBlockSet.

bq. We don't need the last int of Cell length. We have the offset to Cell. See 
constructor - KeyValue(final byte [] bytes, final int offset)

This is a very good comment! I didn’t think in that direction, but we can enjoy 
this “super-compact-representation” :)

bq. If we use the Cell[] way, per Cell we have more overhead.

Of course, Cell[] is expensive. It was implemented because it is very simple, 
easy to debug and to compare with plain byte array serialization. But the 
Cell[] can be useful for very large cells, those bigger then MSLAB Chunks (e.g. 
> 2MB). If we know we are going to deal with such very large cells and do not 
want to allocate un-reusable special-size MSLAB Chunks, CellArrayMap is good 
solution (also new name for CellBlockObjectArray).

bq. BTW HBASE-15179, under this we are doing some PoC and test with off heap 
Memstore.

Please pay attention that as part of this jira we change MSLAB and 
MemStoreChunkPool files. Need to align with your code taking MSLAB off-heap.

bq. The 3 ints per cell also written to chunks we get from same MSLAB. We need 
this really? So if we change to 8 bytes per cell, and when chunk size is 2 MB, 
we can have 262144 cells. We will have this many really? If not, we may waste 
that chunk?

Excellent discussion, [~anoop.hbase]! Those were my thoughts as well… 
Initially, I wrote CellBlockSerialized (now called CellChunkMap) as getting 
byte[] of any size and dealing with it. However, later I thought that this 
might be needed to be taken off-heap and maybe it is better to centralize all 
this off-heaping to the Chunks. So if Chunk is off-heap then all what is 
implemented on top of it is off-heap as well…

Now, if we may have just 2 int for Cell representation (2^3 bytes), we may fit 
2^21/2^3=2^18 cells in a Chunk of size 2MB.
A cell may use 256=2^8 bytes for all its data, which is not too much. Do we 
often serve Cells with size smaller than that?
If so, then one Chunk can represent 2^18*2^8=2^26 bytes = 64MB, which is 
already half of what we can hold in one MemStore without flushing to disk. 
>From here, in 99% we will not use Chunk[] and single Chunk is enough.

But what if not? What if we have really small cells, like integer for a key and 
integer for a data? Is it a possible use-case? For such small cells the 
representation of metadata is actually super-important, as you do not want 
metadata to be bigger than data…


I will continue answering more questions already posted here...

> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, 
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, 
> HBASE-14921-V02.patch
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14921) Memory optimizations

Reply via email to