[
https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940016#comment-15940016
]
ramkrishna.s.vasudevan commented on HBASE-16438:
------------------------------------------------
I have a way to solve this problem. LEt's discuss before I put up the patch.
Most of the other RB comments are fixed.
-> Now since we need to if the chunk is from pool or not - the Chunk will have
a boolean indicating whther the chunk was created for the pool. Say we have
isFromPool() will return true for those chunks.
-> Every chunk will have an AtomicInteger ref count.
-> When the MSLAB does a copyToChunkCell - where we know that the cell has to
have a chunk(comes out of chunkCreator) we do an increment of the refCount.
-> Now in the MemstoreImpl when we do getCellSet().add() ( we need to have a
new API in CellSet which actually returns the cell that was already there in
the CSLM which is returned by CSLM.put() returns. Now we only have
cellSet#add() which return boolean).
-> On this returned cell (which is the actual duplicate cell) we get the
chunkId from the Cell. remember we now have a BbChunkCell which can give the
chunkid frm the 0th offset.
-> Use this chunkId to actually do a decrement of the reference count of this
chunk. For this we need a decrementChunkRefCount in MSLAB interface. I think it
is valid because MSLAB impl is nothing but Chunks.
-> Now on doing this decrementChunkRefCount , we could check if the result is
now 0 and if so just remove that chunk from the chunkCreator map. So by this
way we are making sure that the reference to the chunk is released immediately.
-> Things to note is that in case the chunk is from Pool this
increment/decrement will not have any impact. This will impact only when we
have ondemand chunks.
-> There is an atomic ref count operation happening now which may add on to the
write path overhead. May be need to see the impact. but remember this is going
to happen only if there are lot of duplicates like in HBASE-16195. In a normal
case this should not be a problem because the CSLM#put() is going to return a
null as there is no duplicate and so there are no such problems. And infact in
such a case the GC issue mentioned in HBASE-16195 will not happen as all the
chunks are needed till the MSLAB is closed.
Thoughts!!!
> Create a cell type so that chunk id is embedded in it
> -----------------------------------------------------
>
> Key: HBASE-16438
> URL: https://issues.apache.org/jira/browse/HBASE-16438
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-16438_1.patch,
> HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch,
> HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, HBASE-16438.patch,
> MemstoreChunkCell_memstoreChunkCreator_oldversion.patch,
> MemstoreChunkCell_trunk.patch
>
>
> For CellChunkMap we may need a cell such that the chunk out of which it was
> created, the id of the chunk be embedded in it so that when doing flattening
> we can use the chunk id as a meta data. More details will follow once the
> initial tasks are completed.
> Why we need to embed the chunkid in the Cell is described by [~anastas] in
> this remark over in parent issue
> https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)