[
https://issues.apache.org/jira/browse/HBASE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970445#comment-13970445
]
ramkrishna.s.vasudevan commented on HBASE-10974:
------------------------------------------------
In the existing code
{code}
protected byte[] keyBuffer = new byte[INITIAL_KEY_BUFFER_SIZE];
{code}
A key buffer is created and for every KV that needed to be decoded is filled up
in this array. (same byte[] is getting used).
For eg, see the decodeNext() in PrefixKeyDeltaEncoder.
So except for the common bytes the remaining bytes are copied for every next().
Now for every keyvalue that needs to be retrieved using getKeyValue()(either
for comparison on the upper layer or for returning a KV) we are doing a deep
copy of the above formed key bytes[] and also the value bytes[].
Now here in this patch we would create a long keyBuffer, and keep adding the
keys once the decoding is done and form a continuous buffer of keys. (this will
have to copy the common part from the previous key formed too). When we try to
use Cells here we don't need to do a deep copy of the keys and also the value
part can still refer to the common buffer.(See HBASE-10801)
The other advantage that you get here is later if we go with Cells backed by
offheap BB or BRs, we will never need to copy the key[] to the on heap for the
comparison and retrievals (atleast in the StoreScanner layer and below).
If there is a Kv with 100 bytes for Key and 500 bytes for Value and among that
50 bytes of key are common
Existing code will only get 50 bytes - the uncommon part from the common
buffer. But later while doing getKeyValue copies the whole 100 bytes to a new
kv Buffer.
It also tries to copy the value part of 500 bytes that can be avoided with
Cells anyway.
With the patch the Key buffer copies the 50 uncommon bytes and also the 50
common bytes from previous KV but while doing getKeyValue there is no copy at
all.
So in case 1 there is 150 bytes copy that happens and in the case 2 only 100
bytes copy happens.
Now in cases where the common part is even lesser then we tend to benefit more
out of this incases where the row changes.
Regarding the comparison that happens inside BufferedDataBlockEncoder, thats
still works with the old logic using the common prefix among the formed Key
buffer except that the offset of that changes.
> Improve DBEs read performance by avoiding byte array deep copies for key[]
> and value[]
> --------------------------------------------------------------------------------------
>
> Key: HBASE-10974
> URL: https://issues.apache.org/jira/browse/HBASE-10974
> Project: HBase
> Issue Type: Improvement
> Components: Scanners
> Affects Versions: 0.99.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.99.0
>
> Attachments: HBASE-10974_1.patch
>
>
> As part of HBASE-10801, we tried to reduce the copy of the value [] in
> forming the KV from the DBEs.
> The keys required copying and this was restricting us in using Cells and
> always wanted to copy to be done.
> The idea here is to replace the key byte[] as ByteBuffer and create a
> consecutive stream of the keys (currently the same byte[] is used and hence
> the copy). Use offset and length to track this key bytebuffer.
> The copy of the encoded format to normal Key format is definitely needed and
> can't be avoided but we could always avoid the deep copy of the bytes to form
> a KV and thus use cells effectively. Working on a patch, will post it soon.
--
This message was sent by Atlassian JIRA
(v6.2#6252)