[
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221626#comment-15221626
]
Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------
bq. But one qusetion why is that the init() method position changed now? When a
chunk was got from the pool the init was previously happening after the CAS
operation but now it is now moved into the allocateChunk itself? Will it have
ramifications?
We wanted to concentrate all chunk creations (allocation and initialization) in
the ChunkPool, in order to let the ChunkPool manage the mapping to the ID.
Previously the Chunk was initiated only just before it is going to be directly
used. I see your point [~ram_krish], as it is currently implemented Chunks are
going to be initialized (memory allocated) also when just pre-created for the
pool. This is not efficient. I’ll fix that.
bq. I read the new classes in this patch. So in which patch this is being used
? Or it will come later?
Thank you [~anoop.hbase] for taking a look. The CellFlatMap (CellBlock name
changed with [~stack] help) is going to be part of ImmutableSegment, so after
in-memory-flush, the CSLM should be changed to CellFlatMap. I am currently
writing this code and hope to present it soon. Some intuition for the usage can
be found in the TestCellBlockSet.
bq. We don't need the last int of Cell length. We have the offset to Cell. See
constructor - KeyValue(final byte [] bytes, final int offset)
This is a very good comment! I didn’t think in that direction, but we can enjoy
this “super-compact-representation” :)
bq. If we use the Cell[] way, per Cell we have more overhead.
Of course, Cell[] is expensive. It was implemented because it is very simple,
easy to debug and to compare with plain byte array serialization. But the
Cell[] can be useful for very large cells, those bigger then MSLAB Chunks (e.g.
> 2MB). If we know we are going to deal with such very large cells and do not
want to allocate un-reusable special-size MSLAB Chunks, CellArrayMap is good
solution (also new name for CellBlockObjectArray).
bq. BTW HBASE-15179, under this we are doing some PoC and test with off heap
Memstore.
Please pay attention that as part of this jira we change MSLAB and
MemStoreChunkPool files. Need to align with your code taking MSLAB off-heap.
bq. The 3 ints per cell also written to chunks we get from same MSLAB. We need
this really? So if we change to 8 bytes per cell, and when chunk size is 2 MB,
we can have 262144 cells. We will have this many really? If not, we may waste
that chunk?
Excellent discussion, [~anoop.hbase]! Those were my thoughts as well…
Initially, I wrote CellBlockSerialized (now called CellChunkMap) as getting
byte[] of any size and dealing with it. However, later I thought that this
might be needed to be taken off-heap and maybe it is better to centralize all
this off-heaping to the Chunks. So if Chunk is off-heap then all what is
implemented on top of it is off-heap as well…
Now, if we may have just 2 int for Cell representation (2^3 bytes), we may fit
2^21/2^3=2^18 cells in a Chunk of size 2MB.
A cell may use 256=2^8 bytes for all its data, which is not too much. Do we
often serve Cells with size smaller than that?
If so, then one Chunk can represent 2^18*2^8=2^26 bytes = 64MB, which is
already half of what we can hold in one MemStore without flushing to disk.
>From here, in 99% we will not use Chunk[] and single Chunk is enough.
But what if not? What if we have really small cells, like integer for a key and
integer for a data? Is it a possible use-case? For such small cells the
representation of metadata is actually super-important, as you do not want
metadata to be bigger than data…
I will continue answering more questions already posted here...
> Memory optimizations
> --------------------
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: Eshcar Hillel
> Assignee: Anastasia Braginsky
> Attachments: CellBlocksSegmentInMemStore.pdf,
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch,
> HBASE-14921-V02.patch
>
>
> Memory optimizations including compressed format representation and offheap
> allocations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)