[
https://issues.apache.org/jira/browse/HBASE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975267#comment-13975267
]
Matt Corgan commented on HBASE-10974:
-------------------------------------
{quote}The key is, I suppose, that the entire scanner stack above
StoreFileScanner always uses exactly one KV (the current one). If that were the
case we could allocate and reuse a single KV object. Do I understand this
correctly Matt Corgan?{quote}Yes, that's my understanding too.
{quote}the entire scanner stack above StoreFileScanner always uses exactly one
KV (the current one){quote}Well, there *may* be exceptions where we may need to
"buffer" a standalone KV/Cell, but it could be done lazily in the upper layers
with KeyValueUtils.ensureKeyValue(cell) that does a deep-copy if needed. It
may be possible to filter out many cells before they need to be deep-copied.
On the other hand, maybe there are no exceptions.
Looking at StoreFilescanner.next()
{code}
public Cell next() throws IOException {
Cell retKey = cur;
try {
// only seek if we aren't at the end. cur == null implies 'end'.
if (cur != null) {
hfs.next();
cur = hfs.getKeyValue();
if (hasMVCCInfo)
skipKVsNewerThanReadpoint();
}
} catch(IOException e) {
throw new IOException("Could not iterate " + this, e);
}
return retKey;
}
{code}
it looks like we're calling hfs.next() and then returning the result, which
should be ok unless I'm missing something (which is likely). Ram said he had a
problem here.
but in KeyValueHeap.next()
{code}
public Cell next() throws IOException {
if(this.current == null) {
return null;
}
Cell kvReturn = this.current.next();
Cell kvNext = this.current.peek();
if (kvNext == null) {
this.current.close();
this.current = pollRealKV();
} else {
KeyValueScanner topScanner = this.heap.peek();
// no need to add current back to the heap if it is the only scanner left
if (topScanner != null && this.comparator.compare(kvNext,
topScanner.peek()) >= 0) {
this.heap.add(this.current);
this.current = pollRealKV();
}
}
return kvReturn;
}
{code}
i get confused. We are simultaneously holding a reference to kvReturn and
kvNext which would not be allowed. This definitely looks like a problem -
maybe it's possible to do this lazily.
> Improve DBEs read performance by avoiding byte array deep copies for key[]
> and value[]
> --------------------------------------------------------------------------------------
>
> Key: HBASE-10974
> URL: https://issues.apache.org/jira/browse/HBASE-10974
> Project: HBase
> Issue Type: Improvement
> Components: Scanners
> Affects Versions: 0.99.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.99.0
>
> Attachments: HBASE-10974_1.patch
>
>
> As part of HBASE-10801, we tried to reduce the copy of the value [] in
> forming the KV from the DBEs.
> The keys required copying and this was restricting us in using Cells and
> always wanted to copy to be done.
> The idea here is to replace the key byte[] as ByteBuffer and create a
> consecutive stream of the keys (currently the same byte[] is used and hence
> the copy). Use offset and length to track this key bytebuffer.
> The copy of the encoded format to normal Key format is definitely needed and
> can't be avoided but we could always avoid the deep copy of the bytes to form
> a KV and thus use cells effectively. Working on a patch, will post it soon.
--
This message was sent by Atlassian JIRA
(v6.2#6252)