[
https://issues.apache.org/jira/browse/HBASE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026231#comment-16026231
]
Anoop Sam John commented on HBASE-17874:
----------------------------------------
We are doing a slice () call on the duplicated BB (4 MB sized) when we read out
a block from it.. The slice call changes the capacity to be (newLimit - newPos)
64 KB.. So functionally it looks ok. It just behave same way as for the other
blocks which are read from DFS or L1 cache. Sorry I was missing the call to
slice() that we have at the end
{code}
for (int i = startBuffer, j = 0; i <= endBuffer; ++i, j++) {
ByteBuffer bb = buffers[i].duplicate();
if (i == startBuffer) {
...
} else if (i == endBuffer) {
...
} else {
...
}
mbb[j] = bb.slice();
{code}
> Limiting of read request response size based on block size may go wrong when
> blocks are read from onheap or off heap bucket cache
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-17874
> URL: https://issues.apache.org/jira/browse/HBASE-17874
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17874.patch
>
>
> HBASE-14978 added this size limiting so as to make sure the multi read
> requests do not retain two many blocks. This works well when the blocks are
> obtained from any where other than memory mode BucketCache. In case of on
> heap or off heap Bucket Cache, the entire cache area is split into N
> ByteBuffers each of size 4 MB. When we hit a block in this cache, we no
> longer do copy data into temp array. We use the same shared memory (BB). Its
> capacity is 4 MB.
> The block size accounting logic is RSRpcServices is like below
> {code}
> if (c instanceof ByteBufferCell) {
> ByteBufferCell bbCell = (ByteBufferCell) c;
> ByteBuffer bb = bbCell.getValueByteBuffer();
> if (bb != lastBlock) {
> context.incrementResponseBlockSize(bb.capacity());
> lastBlock = bb;
> }
> } else {
> // We're using the last block being the same as the current block as
> // a proxy for pointing to a new block. This won't be exact.
> // If there are multiple gets that bounce back and forth
> // Then it's possible that this will over count the size of
> // referenced blocks. However it's better to over count and
> // use two rpcs than to OOME the regionserver.
> byte[] valueArray = c.getValueArray();
> if (valueArray != lastBlock) {
> context.incrementResponseBlockSize(valueArray.length);
> lastBlock = valueArray;
> }
> }
> {code}
> We take the BBCell's value buffer and takes its capacity. The cell is backed
> by the same BB that backs the HFileBlock. When the HFileBlock is created from
> the BC, we do as below duplicating and proper positioning and limiting the BB
> {code}
> ByteBuffer bb = buffers[i].duplicate();
> if (i == startBuffer) {
> cnt = bufferSize - startBufferOffset;
> if (cnt > len) cnt = len;
> bb.limit(startBufferOffset + cnt).position(startBufferOffset);
> {code}
> Still this BB's capacity is 4 MB.
> This will make the size limit breach to happen too soon. What we expect is
> block size defaults to 64 KB and so we here by allow cells from different
> blocks to appear in response. We have a way to check whether we move from one
> block to next.
> {code}
> if (bb != lastBlock) {
> ...
> lastBlock = bb;
> }
> {code}
> But already just by considering the 1st cell, we added 4 MB size!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)