[
https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583177#comment-14583177
]
ramkrishna.s.vasudevan commented on HBASE-12295:
------------------------------------------------
We had more discussions on the Cache type and memory type
Let me try to explain use of each (Part is already said by Ram as well as me in
RB. Pardon for the repetition)
CacheType is saying whether the block is from which cache or not at all from
cache. This is useful while returning back the block. The return back is what
is doing the ref count decrement now. But the return block can do any other
kind of cleanup. We tried to make it general. So while return back, we have to
know to which cache the block has to be returned (L1 or L2). As of today the L1
return is a noop still we are doing it. If we dont have the CacheType in block,
we have to return it to both and search in both places. This is an overhead.
Also there is another problem with CombinedCache. Consider a block is demoted
from L1 to L2. While it was in L1, it was served from it to a scanner. But
before it is returned back, it got moved to L2. Then another scanner get this
same block and this time from L2. So the ref count for this block (block key)
got incremented and now it is 1. (Remember the old scanner got from L1 and so
ref count increment at that time).. Now the old scanner returning block and we
return it to both L1 and L2. L2 will have an entry and ref count will become
zero. Still an active scanner refering this. This can cause issues. Tomorrow
if L3 cache also comes, if we mark the block from where it has come, we can do
correct return and correct action.
The usage of MemType (as of now Shared or NonShared) is for cell creation. We
have to make Cells backed by shared cache memory location as SharedMemoryCell
marked. Cells which are not coming from shared mem backed blocks, need not be
SharedMemoryCell marked. CacheType of L2 does not mean always that the block is
backed by shared memory. An eg: is FileIOEngine. Here while reading the blocks,
we will have to read the data to a heap memory area (byte[]) from files.
May be we can say for SharedMem type only the return is needed as of today.
Still we wanted these 2 to be independent general things and f/w. IMHO, this
looks cleaner and more extendable for future.
To which Stack replied
One more Q from Stack was why we need CacheType and MemType 2 enums? Will one
be enough?
Let me try to explain use of each (Part is already said by Ram as well as me in
RB. Pardon for the repetition)
CacheType is saying whether the block is from which cache or not at all from
cache. This is useful while returning back the block. The return back is what
is doing the ref count decrement now. But the return block can do any other
kind of cleanup. We tried to make it general. So while return back, we have to
know to which cache the block has to be returned (L1 or L2). As of today the L1
return is a noop still we are doing it. If we dont have the CacheType in block,
we have to return it to both and search in both places.
>>Can you mark the HFileBlock with where it came form when you cache it rather
>>than write out the type with the data?
This is an overhead. Also there is another problem with CombinedCache.
>>CombinedCache is one possible combination. Change it if is making your life
>>more difficult.
Consider a block is demoted from L1 to L2.
For CombinedCache, this would be an index or bloom block only.
While it was in L1, it was served from it to a scanner. But before it is
returned back, it got moved to L2. Then another scanner get this same block and
this time from L2. So the ref count for this block (block key) got incremented
and now it is 1. (Remember the old scanner got from L1 and so ref count
increment at that time).. Now the old scanner returning block and we return it
to both L1 and L2. L2 will have an entry and ref count will become zero. Still
an active scanner refering this. This can cause issues. Tomorrow if L3 cache
also comes, if we mark the block from where it has come, we can do correct
return and correct action.
>>They'd be the same HFileBlock instance? The same item in cache? We're
>>talking now of moving between caches while something is being used. That'd be
>>a no-no, right? If its referenced you can't move it, not unless its L1 where
>>it is safe to move it (you'd just scrub refcounts)
>>Can keep type specific refcounts if an issue?
The usage of MemType (as of now Shared or NonShared) is for cell creation. We
have to make Cells backed by shared cache memory location as SharedMemoryCell
marked. Cells which are not coming from shared mem backed blocks, need not be
SharedMemoryCell marked. CacheType of L2 does not mean always that the block is
backed by shared memory. An eg: is FileIOEngine. Here while reading the blocks,
we will have to read the data to a heap memory area (byte[]) from files.
>>Ok
May be we can say for SharedMem type only the return is needed as of today.
Still we wanted these 2 to be independent general things and f/w. IMHO, this
looks cleaner and more extendable for future.
>>Ok.
> Prevent block eviction under us if reads are in progress from the BBs
> ---------------------------------------------------------------------
>
> Key: HBASE-12295
> URL: https://issues.apache.org/jira/browse/HBASE-12295
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf,
> HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_trunk.patch
>
>
> While we try to serve the reads from the BBs directly from the block cache,
> we need to ensure that the blocks does not get evicted under us while
> reading. This JIRA is to discuss and implement a strategy for the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)