[ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213470#comment-15213470
 ] 

Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------

Thank you [~stack] for such thorough review! Hereby please find my answers:

bq. CellBlockObjectArray presumes passed in array of Cells already sorted?

yes

bq. Calls to createCellBlocks are only when making subMaps? 

yes

bq. Does the 'CellBlock' Map have to be a Concurrent Map? This is a readonly 
datastructure once made? No need of the 'Concurrent'?

As you have mentioned, CellBlock is implementing ConcurrentNavigbleMap because 
delegatee of the CellSet is of type ConcurrentNavigbleMap and we want just to 
give CellSet another delegatee instead of ConcurrentSkipListMap.

Neither me nor Eshcar had implemented the CellSet. It was there, just called 
CellSkipListSet.

bq. It is odd that comparator is passed on construction and when we call 
createCellBlocks…

 You are right. As we are working only with Cells we actually need only 
CellComparator.COMPARATOR and there is no need to pass it in constructor.

bq. Why name the method getCellFromIndex(int i)? Should it be get( i ) because 
we are getting the ith Cell in Map?

I’ll change the name as you say.

bq. It'd be cool to compare a CSLM to one of these CAMs at different scales to 
see difference in perf in a test bench.

Sure, this is what I try to do next


bq. For CellBlockObjectArray, it should be called CellArrayMap?

Regarding the names, as “CellBlock” is already in some other use, we suggest 
the following names (almost as you suggested). 

CellBlock  -> CellFlatMap

CellBlockObjectArray  -> CellArrayMap

CellBlockSerialized  -> CellChunkMap

Why “flat”? Because this is how we call the process of changing the 
ConcurrentSkipListMap to CellMap. When we have in-memory-flash and there is no 
need to compact (unnecessary copies) we *flatten* the segment. Meaning we 
create ImmutableSegment with the same MSLAB, but with different Map, replacing 
ConcurrentSkipListMap to CellMap. What do you think?

bq. What does "CellBlockObjectArray is a simple array of Cells allocated using 
JVM." mean? All our stuff is on a JVM (smile)

Indeed silly comment :) , will improve it. I just wanted to emphasize that this 
is a “java array”, meaning allocated from Java Heap and this is actually an 
array of references pointing to Cell objects (as always in Java). All that is 
not-efficient when compared to CellBlockSerialized/CellFlatChunkMap which can 
be allocated off-heap and is a byte array using less memory and less 
de-referencing… 

bq. Cell[] block;    <<< should be final

yes

bq. private int numOfCellsInsideChunk;  <<< Is that total Cells?

no, as you said, it is Cells per chunk and it should be constant for all Chunks 
in a CellBlockSerialized, because MSLAB always allocates Chunks with same size

bq. This seems to be the 'key', or 'index key' size, not bytes in a cell?: 
private static final int BYTES_IN_CELL = 3*(Integer.SIZE / Byte.SIZE); // each 
Cell requires 3 integers

This is the number of bytes we need to represent a reference to a single Cell. 
If we want to access Cell number 8 you need to skip 7*BYTES_IN_CELL bytes from 
the beginning of the byte array. You can see it as an index key of the Cell as 
well. But I think BYTES_IN_CELL is more intuitive.

bq. I love the way this works. Very nice. Simple. Only bit is how you ensure 
all Chunks have same amount of Cells?

Thank you!

bq. Remove this stuff rather than just comment it out:        //c = (chunkPool 
!= null) ? chunkPool.getChunk() : new Chunk(chunkSize, 5); //14921

I left this comment in the code on purpose, in order to raise this question. 
This is related to your comment: “The Chunk Pool needs to be on for this all to 
work? Is that in this patch?”. Indeed the Chunk Pool must be on for this all to 
work. But we are not sure what are the configurations where ChunkPool is off on 
purpose and what we can do with such configurations. Do you know?

bq. Is this a hardcoding? :         c = new Chunk(chunkSize, 5);

This is typo... forgot it there after debugging… will remove all that 

bq. For…  "Only used in testing” we usually mark with @VisibleForTestiing 
annotation.

Sure, will fix that

bq. What is the machine word? There is the 3 * int that is used to find chunk, 
offset and length.. is the 'word' the chunk+offset portion? What CSLM are we 
referring to? The MemStore? Or this 'CellBock'?

The sentence came to explain that the CSLM holds little data inside. The CSLM 
holds only references to the Cell, which points to the Cell object, which in 
turn reference to the Cell’s data on the MSLAB’s Chunk. The machine word is the 
size of the reference, which is 32 or 64 bits, depends on the machine. Sorry 
for being unclear...





> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, 
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, 
> HBASE-14921-V02.patch
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to