[
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213470#comment-15213470
]
Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------
Thank you [~stack] for such thorough review! Hereby please find my answers:
bq. CellBlockObjectArray presumes passed in array of Cells already sorted?
yes
bq. Calls to createCellBlocks are only when making subMaps?
yes
bq. Does the 'CellBlock' Map have to be a Concurrent Map? This is a readonly
datastructure once made? No need of the 'Concurrent'?
As you have mentioned, CellBlock is implementing ConcurrentNavigbleMap because
delegatee of the CellSet is of type ConcurrentNavigbleMap and we want just to
give CellSet another delegatee instead of ConcurrentSkipListMap.
Neither me nor Eshcar had implemented the CellSet. It was there, just called
CellSkipListSet.
bq. It is odd that comparator is passed on construction and when we call
createCellBlocks…
You are right. As we are working only with Cells we actually need only
CellComparator.COMPARATOR and there is no need to pass it in constructor.
bq. Why name the method getCellFromIndex(int i)? Should it be get( i ) because
we are getting the ith Cell in Map?
I’ll change the name as you say.
bq. It'd be cool to compare a CSLM to one of these CAMs at different scales to
see difference in perf in a test bench.
Sure, this is what I try to do next
bq. For CellBlockObjectArray, it should be called CellArrayMap?
Regarding the names, as “CellBlock” is already in some other use, we suggest
the following names (almost as you suggested).
CellBlock -> CellFlatMap
CellBlockObjectArray -> CellArrayMap
CellBlockSerialized -> CellChunkMap
Why “flat”? Because this is how we call the process of changing the
ConcurrentSkipListMap to CellMap. When we have in-memory-flash and there is no
need to compact (unnecessary copies) we *flatten* the segment. Meaning we
create ImmutableSegment with the same MSLAB, but with different Map, replacing
ConcurrentSkipListMap to CellMap. What do you think?
bq. What does "CellBlockObjectArray is a simple array of Cells allocated using
JVM." mean? All our stuff is on a JVM (smile)
Indeed silly comment :) , will improve it. I just wanted to emphasize that this
is a “java array”, meaning allocated from Java Heap and this is actually an
array of references pointing to Cell objects (as always in Java). All that is
not-efficient when compared to CellBlockSerialized/CellFlatChunkMap which can
be allocated off-heap and is a byte array using less memory and less
de-referencing…
bq. Cell[] block; <<< should be final
yes
bq. private int numOfCellsInsideChunk; <<< Is that total Cells?
no, as you said, it is Cells per chunk and it should be constant for all Chunks
in a CellBlockSerialized, because MSLAB always allocates Chunks with same size
bq. This seems to be the 'key', or 'index key' size, not bytes in a cell?:
private static final int BYTES_IN_CELL = 3*(Integer.SIZE / Byte.SIZE); // each
Cell requires 3 integers
This is the number of bytes we need to represent a reference to a single Cell.
If we want to access Cell number 8 you need to skip 7*BYTES_IN_CELL bytes from
the beginning of the byte array. You can see it as an index key of the Cell as
well. But I think BYTES_IN_CELL is more intuitive.
bq. I love the way this works. Very nice. Simple. Only bit is how you ensure
all Chunks have same amount of Cells?
Thank you!
bq. Remove this stuff rather than just comment it out: //c = (chunkPool
!= null) ? chunkPool.getChunk() : new Chunk(chunkSize, 5); //14921
I left this comment in the code on purpose, in order to raise this question.
This is related to your comment: “The Chunk Pool needs to be on for this all to
work? Is that in this patch?”. Indeed the Chunk Pool must be on for this all to
work. But we are not sure what are the configurations where ChunkPool is off on
purpose and what we can do with such configurations. Do you know?
bq. Is this a hardcoding? : c = new Chunk(chunkSize, 5);
This is typo... forgot it there after debugging… will remove all that
bq. For… "Only used in testing” we usually mark with @VisibleForTestiing
annotation.
Sure, will fix that
bq. What is the machine word? There is the 3 * int that is used to find chunk,
offset and length.. is the 'word' the chunk+offset portion? What CSLM are we
referring to? The MemStore? Or this 'CellBock'?
The sentence came to explain that the CSLM holds little data inside. The CSLM
holds only references to the Cell, which points to the Cell object, which in
turn reference to the Cell’s data on the MSLAB’s Chunk. The machine word is the
size of the reference, which is 32 or 64 bits, depends on the machine. Sorry
for being unclear...
> Memory optimizations
> --------------------
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: Eshcar Hillel
> Assignee: Anastasia Braginsky
> Attachments: CellBlocksSegmentInMemStore.pdf,
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch,
> HBASE-14921-V02.patch
>
>
> Memory optimizations including compressed format representation and offheap
> allocations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)