[jira] [Comment Edited] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying

Xiang Li (JIRA) Fri, 02 Dec 2016 04:29:24 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715001#comment-15715001
 ]


Xiang Li edited comment on HBASE-14882 at 12/2/16 12:28 PM:
------------------------------------------------------------

Hi [~anoop.hbase], thanks for the your time and comments!

May I ask some questions about your comments？
1. Regarding
bq. This extra copy can be avoided easily
Sorry that I did not get your idea. Do you mean that there is another function 
in KeyValueUtil which can help in write() here but does not do the extra copy? 
Or do you mean to make it as follow(I meant to use the following code, but 
finally made it to use KeyValueUtil#appendToByteArray() in patch 004) ?
{code}
  public int write(OutputStream out, boolean withTags) throws IOException {
    // Key length and then value length
    out.write(Bytes.toBytes(KeyValueUtil.keyLength(this)));
    out.write(Bytes.toBytes(getValueLength()));

    // Row length and then row byte array
    out.write(Bytes.toBytes(getRowLength()));
    out.write(getRowArray(), getRowOffset(), getRowLength());

    // Family length and then family byte array
    out.write(getFamilyLength());
    out.write(getFamilyArray(), getFamilyOffset(), getFamilyLength());

    // Qualifier byte array, no qualifier length
    out.write(getQualifierArray(), getQualifierOffset(), getQualifierLength());

    // Timestamp
    out.write(Bytes.toBytes(getTimestamp()));

    // Type
    out.write(getTypeByte());

    // Value
    out.write(getValueArray(), getValueOffset(), getValueLength());

    // Tags length and tags byte array
    if (withTags && getTagsLength() > 0) {
      // Tags length
      byte[] bufferForTagsLength = new byte[2];
      Bytes.putAsShort(bufferForTagsLength, 0, getTagsLength());
      out.write(bufferForTagsLength);

      // Tags byte array
      out.write(getTagsArray(), getTagsOffset(), getTagsLength());
    }

    return getSerializedSize(withTags);
  }
{code}

2. Regarding
bq. We add size of 5 refs. All are array type. Means we have to include 5 * 
ClassSize.ARRAY
I put 5 * ClassSize.ARRAY when calculating heapSize() (ClassSize.sizeOf() is 
called), not in heapOverhead(). Do you mean to move the ClassSize.ARRAY into 
heapSize()? I referred to KeyValue, in which, ClassSize.ARRAY of bytes is 
included into heapSize().

3. Regarding heapOverhead() and heapSize() in KeyValue
{code}
  public long heapOverhead() {
    return FIXED_OVERHEAD;
  }

  public long heapSize() {
    long sum = FIXED_OVERHEAD;
    /*
     * Deep object overhead for this KV consists of two parts. The first part 
is the KV object
     * itself, while the second part is the backing byte[]. We will only count 
the array overhead
     * from the byte[] only if this is the first KV in there.
     */
    return ClassSize.align(sum) +
        (offset == 0
          ? ClassSize.sizeOf(bytes, length) // count both length and object 
overhead
          : length);                        // only count the number of bytes
  }
{code}
heapOverhead() does not do the alignment(padding), while alignment of overhead 
is performed in heapSize(). I might have a different idea: heapOverhead should 
do alignment before it's return, because the space used in alignment can not be 
used by others. Do you think so?


was (Author: water):
Hi [~anoop.hbase], thanks for the your time and comments!

May I ask some questions about your comments？
1. Regarding
bq. This extra copy can be avoided easily
Sorry that I did not get your idea. Do you mean that there is another function 
in KeyValueUtil which can help in write() here but does not do the extra copy? 
Or do you mean to make it as follow(I meant to use the following code, but 
finally made it to use KeyValueUtil#appendToByteArray() in patch 004) ?
{code}
  public int write(OutputStream out, boolean withTags) throws IOException {
    // Key length and then value length
    out.write(Bytes.toBytes(KeyValueUtil.keyLength(this)));
    out.write(Bytes.toBytes(getValueLength()));

    // Row length and then row byte array
    out.write(Bytes.toBytes(getRowLength()));
    out.write(getRowArray(), getRowOffset(), getRowLength());

    // Family length and then family byte array
    out.write(getFamilyLength());
    out.write(getFamilyArray(), getFamilyOffset(), getFamilyLength());

    // Qualifier byte array, no qualifier length
    out.write(getQualifierArray(), getQualifierOffset(), getQualifierLength());

    // Timestamp
    out.write(Bytes.toBytes(getTimestamp()));

    // Type
    out.write(getTypeByte());

    // Value
    out.write(getValueArray(), getValueOffset(), getValueLength());

    // Tags length and tags byte array
    if (withTags && getTagsLength() > 0) {
      // Tags length
      byte[] bufferForTagsLength = new byte[2];
      Bytes.putAsShort(bufferForTagsLength, 0, getTagsLength());
      out.write(bufferForTagsLength);

      // Tags byte array
      out.write(getTagsArray(), getTagsOffset(), getTagsLength());
    }

    return getSerializedSize(withTags);
  }
{code}

2. Regarding
bq. We add size of 5 refs. All are array type. Means we have to include 5 * 
ClassSize.ARRAY
I put 5 * ClassSize.ARRAY when calculating heapSize() (ClassSize.sizeOf() is 
called), not in heapOverhead(). Do you mean to move the ClassSize.ARRAY into 
heapSize()? I referred to KeyValue, in which, ClassSize.ARRAY of bytes is 
included into heapSize().

3. Regarding heapOverhead() and heapSize() in KeyValue
{code}
  public long heapOverhead() {
    return FIXED_OVERHEAD;
  }
  public long heapSize() {
    long sum = FIXED_OVERHEAD;
    /*
     * Deep object overhead for this KV consists of two parts. The first part 
is the KV object
     * itself, while the second part is the backing byte[]. We will only count 
the array overhead
     * from the byte[] only if this is the first KV in there.
     */
    return ClassSize.align(sum) +
        (offset == 0
          ? ClassSize.sizeOf(bytes, length) // count both length and object 
overhead
          : length);                        // only count the number of bytes
  }
{code}
heapOverhead() does not do the alignment(padding), while alignment of overhead 
is performed in heapSize(). I might have a different idea: heapOverhead should 
do alignment before it's return, because the space used in alignment can not be 
used by others. Do you think so?

> Provide a Put API that adds the provided family, qualifier, value without 
> copying
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-14882
>                 URL: https://issues.apache.org/jira/browse/HBASE-14882
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Jerry He
>            Assignee: Xiang Li
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14882.master.000.patch, 
> HBASE-14882.master.001.patch, HBASE-14882.master.002.patch, 
> HBASE-14882.master.003.patch, HBASE-14882.master.004.patch
>
>
> In the Put API, we have addImmutable()
> {code}
>  /**
>    * See {@link #addColumn(byte[], byte[], byte[])}. This version expects
>    * that the underlying arrays won't change. It's intended
>    * for usage internal HBase to and for advanced client applications.
>    */
>   public Put addImmutable(byte [] family, byte [] qualifier, byte [] value)
> {code}
> But in the implementation, the family, qualifier and value are still being 
> copied locally to create kv.
> Hopefully we should provide an API that truly uses immutable family, 
> qualifier and value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying

Reply via email to