[ 
https://issues.apache.org/jira/browse/HBASE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975267#comment-13975267
 ] 

Matt Corgan commented on HBASE-10974:
-------------------------------------

{quote}The key is, I suppose, that the entire scanner stack above 
StoreFileScanner always uses exactly one KV (the current one). If that were the 
case we could allocate and reuse a single KV object. Do I understand this 
correctly Matt Corgan?{quote}Yes, that's my understanding too.

{quote}the entire scanner stack above StoreFileScanner always uses exactly one 
KV (the current one){quote}Well, there *may* be exceptions where we may need to 
"buffer" a standalone KV/Cell, but it could be done lazily in the upper layers 
with KeyValueUtils.ensureKeyValue(cell) that does a deep-copy if needed.  It 
may be possible to filter out many cells before they need to be deep-copied.  
On the other hand, maybe there are no exceptions.

Looking at StoreFilescanner.next()
{code}
  public Cell next() throws IOException {
    Cell retKey = cur;

    try {
      // only seek if we aren't at the end. cur == null implies 'end'.
      if (cur != null) {
        hfs.next();
        cur = hfs.getKeyValue();
        if (hasMVCCInfo)
          skipKVsNewerThanReadpoint();
      }
    } catch(IOException e) {
      throw new IOException("Could not iterate " + this, e);
    }
    return retKey;
  }
{code}
it looks like we're calling hfs.next() and then returning the result, which 
should be ok unless I'm missing something (which is likely).  Ram said he had a 
problem here.

but in KeyValueHeap.next()
{code}
  public Cell next()  throws IOException {
    if(this.current == null) {
      return null;
    }
    Cell kvReturn = this.current.next();
    Cell kvNext = this.current.peek();
    if (kvNext == null) {
      this.current.close();
      this.current = pollRealKV();
    } else {
      KeyValueScanner topScanner = this.heap.peek();
      // no need to add current back to the heap if it is the only scanner left
      if (topScanner != null && this.comparator.compare(kvNext, 
topScanner.peek()) >= 0) {
        this.heap.add(this.current);
        this.current = pollRealKV();
      }
    }
    return kvReturn;
  }
{code}
i get confused.  We are simultaneously holding a reference to kvReturn and 
kvNext which would not be allowed.  This definitely looks like a problem - 
maybe it's possible to do this lazily.

> Improve DBEs read performance by avoiding byte array deep copies for key[] 
> and value[]
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-10974
>                 URL: https://issues.apache.org/jira/browse/HBASE-10974
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.99.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.99.0
>
>         Attachments: HBASE-10974_1.patch
>
>
> As part of HBASE-10801, we  tried to reduce the copy of the value [] in 
> forming the KV from the DBEs. 
> The keys required copying and this was restricting us in using Cells and 
> always wanted to copy to be done.
> The idea here is to replace the key byte[] as ByteBuffer and create a 
> consecutive stream of the keys (currently the same byte[] is used and hence 
> the copy).  Use offset and length to track this key bytebuffer.
> The copy of the encoded format to normal Key format is definitely needed and 
> can't be avoided but we could always avoid the deep copy of the bytes to form 
> a KV and thus use cells effectively. Working on a patch, will post it soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to