[
https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715001#comment-15715001
]
Xiang Li edited comment on HBASE-14882 at 12/2/16 12:28 PM:
------------------------------------------------------------
Hi [~anoop.hbase], thanks for the your time and comments!
May I ask some questions about your comments?
1. Regarding
bq. This extra copy can be avoided easily
Sorry that I did not get your idea. Do you mean that there is another function
in KeyValueUtil which can help in write() here but does not do the extra copy?
Or do you mean to make it as follow(I meant to use the following code, but
finally made it to use KeyValueUtil#appendToByteArray() in patch 004) ?
{code}
public int write(OutputStream out, boolean withTags) throws IOException {
// Key length and then value length
out.write(Bytes.toBytes(KeyValueUtil.keyLength(this)));
out.write(Bytes.toBytes(getValueLength()));
// Row length and then row byte array
out.write(Bytes.toBytes(getRowLength()));
out.write(getRowArray(), getRowOffset(), getRowLength());
// Family length and then family byte array
out.write(getFamilyLength());
out.write(getFamilyArray(), getFamilyOffset(), getFamilyLength());
// Qualifier byte array, no qualifier length
out.write(getQualifierArray(), getQualifierOffset(), getQualifierLength());
// Timestamp
out.write(Bytes.toBytes(getTimestamp()));
// Type
out.write(getTypeByte());
// Value
out.write(getValueArray(), getValueOffset(), getValueLength());
// Tags length and tags byte array
if (withTags && getTagsLength() > 0) {
// Tags length
byte[] bufferForTagsLength = new byte[2];
Bytes.putAsShort(bufferForTagsLength, 0, getTagsLength());
out.write(bufferForTagsLength);
// Tags byte array
out.write(getTagsArray(), getTagsOffset(), getTagsLength());
}
return getSerializedSize(withTags);
}
{code}
2. Regarding
bq. We add size of 5 refs. All are array type. Means we have to include 5 *
ClassSize.ARRAY
I put 5 * ClassSize.ARRAY when calculating heapSize() (ClassSize.sizeOf() is
called), not in heapOverhead(). Do you mean to move the ClassSize.ARRAY into
heapSize()? I referred to KeyValue, in which, ClassSize.ARRAY of bytes is
included into heapSize().
3. Regarding heapOverhead() and heapSize() in KeyValue
{code}
public long heapOverhead() {
return FIXED_OVERHEAD;
}
public long heapSize() {
long sum = FIXED_OVERHEAD;
/*
* Deep object overhead for this KV consists of two parts. The first part
is the KV object
* itself, while the second part is the backing byte[]. We will only count
the array overhead
* from the byte[] only if this is the first KV in there.
*/
return ClassSize.align(sum) +
(offset == 0
? ClassSize.sizeOf(bytes, length) // count both length and object
overhead
: length); // only count the number of bytes
}
{code}
heapOverhead() does not do the alignment(padding), while alignment of overhead
is performed in heapSize(). I might have a different idea: heapOverhead should
do alignment before it's return, because the space used in alignment can not be
used by others. Do you think so?
was (Author: water):
Hi [~anoop.hbase], thanks for the your time and comments!
May I ask some questions about your comments?
1. Regarding
bq. This extra copy can be avoided easily
Sorry that I did not get your idea. Do you mean that there is another function
in KeyValueUtil which can help in write() here but does not do the extra copy?
Or do you mean to make it as follow(I meant to use the following code, but
finally made it to use KeyValueUtil#appendToByteArray() in patch 004) ?
{code}
public int write(OutputStream out, boolean withTags) throws IOException {
// Key length and then value length
out.write(Bytes.toBytes(KeyValueUtil.keyLength(this)));
out.write(Bytes.toBytes(getValueLength()));
// Row length and then row byte array
out.write(Bytes.toBytes(getRowLength()));
out.write(getRowArray(), getRowOffset(), getRowLength());
// Family length and then family byte array
out.write(getFamilyLength());
out.write(getFamilyArray(), getFamilyOffset(), getFamilyLength());
// Qualifier byte array, no qualifier length
out.write(getQualifierArray(), getQualifierOffset(), getQualifierLength());
// Timestamp
out.write(Bytes.toBytes(getTimestamp()));
// Type
out.write(getTypeByte());
// Value
out.write(getValueArray(), getValueOffset(), getValueLength());
// Tags length and tags byte array
if (withTags && getTagsLength() > 0) {
// Tags length
byte[] bufferForTagsLength = new byte[2];
Bytes.putAsShort(bufferForTagsLength, 0, getTagsLength());
out.write(bufferForTagsLength);
// Tags byte array
out.write(getTagsArray(), getTagsOffset(), getTagsLength());
}
return getSerializedSize(withTags);
}
{code}
2. Regarding
bq. We add size of 5 refs. All are array type. Means we have to include 5 *
ClassSize.ARRAY
I put 5 * ClassSize.ARRAY when calculating heapSize() (ClassSize.sizeOf() is
called), not in heapOverhead(). Do you mean to move the ClassSize.ARRAY into
heapSize()? I referred to KeyValue, in which, ClassSize.ARRAY of bytes is
included into heapSize().
3. Regarding heapOverhead() and heapSize() in KeyValue
{code}
public long heapOverhead() {
return FIXED_OVERHEAD;
}
public long heapSize() {
long sum = FIXED_OVERHEAD;
/*
* Deep object overhead for this KV consists of two parts. The first part
is the KV object
* itself, while the second part is the backing byte[]. We will only count
the array overhead
* from the byte[] only if this is the first KV in there.
*/
return ClassSize.align(sum) +
(offset == 0
? ClassSize.sizeOf(bytes, length) // count both length and object
overhead
: length); // only count the number of bytes
}
{code}
heapOverhead() does not do the alignment(padding), while alignment of overhead
is performed in heapSize(). I might have a different idea: heapOverhead should
do alignment before it's return, because the space used in alignment can not be
used by others. Do you think so?
> Provide a Put API that adds the provided family, qualifier, value without
> copying
> ---------------------------------------------------------------------------------
>
> Key: HBASE-14882
> URL: https://issues.apache.org/jira/browse/HBASE-14882
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 1.2.0
> Reporter: Jerry He
> Assignee: Xiang Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14882.master.000.patch,
> HBASE-14882.master.001.patch, HBASE-14882.master.002.patch,
> HBASE-14882.master.003.patch, HBASE-14882.master.004.patch
>
>
> In the Put API, we have addImmutable()
> {code}
> /**
> * See {@link #addColumn(byte[], byte[], byte[])}. This version expects
> * that the underlying arrays won't change. It's intended
> * for usage internal HBase to and for advanced client applications.
> */
> public Put addImmutable(byte [] family, byte [] qualifier, byte [] value)
> {code}
> But in the implementation, the family, qualifier and value are still being
> copied locally to create kv.
> Hopefully we should provide an API that truly uses immutable family,
> qualifier and value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)