[jira] [Commented] (HBASE-16438) Create a cell type so that chunk id is embedded in it

Anastasia Braginsky (JIRA) Mon, 03 Apr 2017 02:23:52 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953177#comment-15953177
 ]


Anastasia Braginsky commented on HBASE-16438:
---------------------------------------------

bq. What specific question in RB are you looking out for? 

OK. I will write here the questions that bother me and I don't see responses:
1.In ByteBufferChunkCell, please explain me why to add this new class? Why can 
not the existing BBKV just have a new method - getChunkId() -  to return the 
chunk id in the 0th offset of the backing BB?
2. In ByteBufferKeyValue or in MSLAB or anywhere else, please add constant 
saying what is the size in bytes of the ChunkCell or what I call 
cell-representation (chunkId + offset + length + seqId), so I can use it later.
I will review the existing patch once again

bq. ChunkId is per ByteBuffer backing the chunk. I can change the chunkId to be 
an int.

You got it yourself, I also thought so for a moment. I am talking about ChunkID 
of where each cell is located, which is saved per cell. 
Please do change chunkID to int, but check for overflow (at least log some 
error). 
I believe we should strive to decrease number of bytes the cell representation 
is taking, because this is the reason why are we doing the CellChunkMap...

bq. My Q was, this Cell meta data (ChunkId, offset, length) also we planned to 
write to chunks. So what is the difference? In this chunk or that chunk?

Do you mean the seqID is going to be written in index-chunk only and is not 
going to be written in the main-chunk, holding key, value and etc.? So no 
duplication? Are you sure? If so, then already little better, but still I would 
like to keep the Cell meta data smaller.
The smaller the Cell meta data is (hopefully only chunkId, offset, length and 
only 12 bytes) the less is the meta-data-overhead per cell is and the more we 
can squeeze into single index-chunk (CellChunkMap). The smaller CellChunkMap is 
we all enjoy the locality for scans and the binary search can hit the 
processor-cache easily.

bq. The only thing is we should go with fixed 8 bytes for that. 

This is not a desired situation. We are increasing from 12 bytes to 20 bytes, 
almost twice... We should not do it unless it is very very necessary...

bq. So now if you are going to write the seqId in the BB backing every cell, 
then the seqId as the state variable is not needed at all and hence you may 
need a new cell representation for it. 

OK. So lets have a new cell representation.

bq. Otherwise we should still go with it and use the seqID as a caching value 
in addition to having it in the BB. 

Why to have the duplication of the same?


> Create a cell type so that chunk id is embedded in it
> -----------------------------------------------------
>
>                 Key: HBASE-16438
>                 URL: https://issues.apache.org/jira/browse/HBASE-16438
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: HBASE-16438_1.patch, 
> HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch, 
> HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, 
> HBASE-16438_8_ChunkCreatorwrappingChunkPool_withchunkRef.patch, 
> HBASE-16438_9_ChunkCreatorwrappingChunkPool_withchunkRef.patch, 
> HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch, 
> MemstoreChunkCell_trunk.patch
>
>
> For CellChunkMap we may need a cell such that the chunk out of which it was 
> created, the id of the chunk be embedded in it so that when doing flattening 
> we can use the chunk id as a meta data. More details will follow once the 
> initial tasks are completed. 
> Why we need to embed the chunkid in the Cell is described by [~anastas] in 
> this remark over in parent issue 
> https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16438) Create a cell type so that chunk id is embedded in it

Reply via email to