[
https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563359#comment-14563359
]
ramkrishna.s.vasudevan commented on HBASE-12295:
------------------------------------------------
bq.returnBlock(BlockCacheKey cacheKey, HFileBlock block)
bq.Do we need to return the block too in the above? Won't the key be enough?
Ideally yes. But as per our current impl we have a type of block whether it is
from L2 or L1 and hence needed the block there. May be we can only pass the
type of the block there? That should be possible. Not a big deal.
bq.Or, consider that we will want to stream out Cells as they come up out of
the server when we implement a streaming Interface on the server.
Okie. When we tried to directly write the cells to the socket as part of the
POC things were directly slow. May be a different type of protocol/approach may
be needed there.
bq.Hmm... pulling the CellBlock into the Region from the ipc layer? I have
thought that Result should carry CellBlocks.... This would be an extra copy,
right? If we wanted to get to zero copy, would it be possible if we went this
route?
Yes, this will be a zero copy. Currently while creating cell block there is no
copy we do and directly use the encoder to create it. same here except that it
is now in HRegion.
Making Result carry it is one option, I think you mean the PB result right? The
approach here was to be simple use the existing Payload. When you say Result -
will that not be the current way as how we do for non-java clients?
bq.Nah. You can't pull an oddball RPC datastructure back into HRegion. Could it
be done in the Result itself?
Same as above.
bq.He has added a bunch of accounting on where scan is at... state, and has
scans doing heartbeating, and early returns. Can you make use of this work of
his?
I had a look at it. Will check once more before commenting back. But in our
case we need to handle both scans and gets. Scans have states and gets do not
have states as gets operate with in Region.
bq.Tell us more about the marking of Cells from L2 with a new Interface and why
CP need special treatment, need Cells copied when read from CP. We have to do
this?
CPs are bit tricky. Take a CP which is trying to implement a postScannerOpen
hook by wrapping the original scanner.
Now in a non CP approach we have the control on the result and the cellblock
creation and we are sure that once the cell block is created we no longer refer
to the cells from the hfileblocks. But when you have a CP there is a high
chance that those cells are referred for a longer time and the CP tries to use
those Cells as its state. In those cases, if we think that the blocks ref count
can be decremented just because the results have been fetched, we end up
corrupting the states of those CPs. Hence we need to do a copy of the result.
bq.finalizeScan(boolean finalizeAll).
Though we have completed the implementation, we are still seeing if there is a
better way,, but I have done some analysis and I fear that may be very very
tricky. I can come up with a write up after some more analysis but overall the
problem is that the scanner flow has some optimizaitons where we proactively
close some of the scanner from the heap just because they don't return any
result (infact we nullify them also). In such cases just calling close will
not be enough because already those StoreFileScanners could be closed and we
will lose the reference to those scanners.
Hence thought of adding an explicit API to do it. And added to that for the
scan case the close() call alone won't work because there are going to be set
of next() calls for a scan to finish and it makes it better if we clear the
references of those cells then and there. And in case of scans the latest
block would be needed for the subsequent next() calls as Scans are with States.
bq. "In such a case we don’t evict the block if the ref count > 0, instead we
mark those
blocks with a Boolean."
This is a special case. In case of compaction after the files are compacted we
know that the compacted files are no longer needed and we forcefully try to
evict them from the block cache. But now if there were any parallel scans
operating on those files we just cannot evict them. So we use the same ref
count mechanism and see if the block can really evicted (even if it is
forceful). All such blocks would automatically be evicted once the read
operation using that block gets completed. (in the sense on decrementing a
'marked' block to 0 we call evict forcefully). This ensures that the results
are not corrupted.
> Prevent block eviction under us if reads are in progress from the BBs
> ---------------------------------------------------------------------
>
> Key: HBASE-12295
> URL: https://issues.apache.org/jira/browse/HBASE-12295
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-12295.pdf, HBASE-12295_trunk.patch
>
>
> While we try to serve the reads from the BBs directly from the block cache,
> we need to ensure that the blocks does not get evicted under us while
> reading. This JIRA is to discuss and implement a strategy for the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)