[
https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953177#comment-15953177
]
Anastasia Braginsky commented on HBASE-16438:
---------------------------------------------
bq. What specific question in RB are you looking out for?
OK. I will write here the questions that bother me and I don't see responses:
1.In ByteBufferChunkCell, please explain me why to add this new class? Why can
not the existing BBKV just have a new method - getChunkId() - to return the
chunk id in the 0th offset of the backing BB?
2. In ByteBufferKeyValue or in MSLAB or anywhere else, please add constant
saying what is the size in bytes of the ChunkCell or what I call
cell-representation (chunkId + offset + length + seqId), so I can use it later.
I will review the existing patch once again
bq. ChunkId is per ByteBuffer backing the chunk. I can change the chunkId to be
an int.
You got it yourself, I also thought so for a moment. I am talking about ChunkID
of where each cell is located, which is saved per cell.
Please do change chunkID to int, but check for overflow (at least log some
error).
I believe we should strive to decrease number of bytes the cell representation
is taking, because this is the reason why are we doing the CellChunkMap...
bq. My Q was, this Cell meta data (ChunkId, offset, length) also we planned to
write to chunks. So what is the difference? In this chunk or that chunk?
Do you mean the seqID is going to be written in index-chunk only and is not
going to be written in the main-chunk, holding key, value and etc.? So no
duplication? Are you sure? If so, then already little better, but still I would
like to keep the Cell meta data smaller.
The smaller the Cell meta data is (hopefully only chunkId, offset, length and
only 12 bytes) the less is the meta-data-overhead per cell is and the more we
can squeeze into single index-chunk (CellChunkMap). The smaller CellChunkMap is
we all enjoy the locality for scans and the binary search can hit the
processor-cache easily.
bq. The only thing is we should go with fixed 8 bytes for that.
This is not a desired situation. We are increasing from 12 bytes to 20 bytes,
almost twice... We should not do it unless it is very very necessary...
bq. So now if you are going to write the seqId in the BB backing every cell,
then the seqId as the state variable is not needed at all and hence you may
need a new cell representation for it.
OK. So lets have a new cell representation.
bq. Otherwise we should still go with it and use the seqID as a caching value
in addition to having it in the BB.
Why to have the duplication of the same?
> Create a cell type so that chunk id is embedded in it
> -----------------------------------------------------
>
> Key: HBASE-16438
> URL: https://issues.apache.org/jira/browse/HBASE-16438
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-16438_1.patch,
> HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch,
> HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch,
> HBASE-16438_8_ChunkCreatorwrappingChunkPool_withchunkRef.patch,
> HBASE-16438_9_ChunkCreatorwrappingChunkPool_withchunkRef.patch,
> HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch,
> MemstoreChunkCell_trunk.patch
>
>
> For CellChunkMap we may need a cell such that the chunk out of which it was
> created, the id of the chunk be embedded in it so that when doing flattening
> we can use the chunk id as a meta data. More details will follow once the
> initial tasks are completed.
> Why we need to embed the chunkid in the Cell is described by [~anastas] in
> this remark over in parent issue
> https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)