[jira] [Comment Edited] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying

Xiang Li (JIRA) Fri, 09 Sep 2016 08:54:37 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15476403#comment-15476403
 ]


Xiang Li edited comment on HBASE-14882 at 9/9/16 3:54 PM:
----------------------------------------------------------

Hi [~anoop.hbase], I have some questions, could you please take a look at them 
when you have time?

1. KeyValue implements 5 more interfaces beside Cell, they are HeapSize, 
Cloneable, SettableSequenceId, SettableTimestamp, Streamable(Streamable is only 
in master branch for 2.0.0).
Do I need to implement them for IndividualBytesFieldCell?

I think SettableSequenceId is a must, because I found an error when testing my 
patch without SettableSequenceId implemented.
When HBase is re-started, the following exception can be found in RS log when 
master asks the RS to open a region
{code}
java.io.IOException: java.lang.UnsupportedOperationException: Cell is not of 
type org.apache.hadoop.hbase.SettableSequenceId
        at org.apache.hadoop.hbase.CellUtil.setSequenceId(CellUtil.java:676)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSWALEntry.stampRegionSequenceId(FSWALEntry.java:124)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1853)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1750)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException: Cell is not of type 
org.apache.hadoop.hbase.SettableSequenceId
        ... 9 more
{code}

SettableSequenceId and SettableTimestamp are server-side code to me, but Put is 
client-side code. I am just wondering why the Cell implementation used in 
Put(in client side) are expected to implement some server-side code.

2. About Streamable interface which is added by HBASE-13754.
(1) Do I need to implement Streamable as well? 
Regarding the description of HBASE-13754, you mentioned:
bq. For other cell implementation, we will call getXXXLength() and 
getXXXArray() and write each component one after the other
It can improve the efficiency if I have a single backing array as a whole and 
implement Streamable, but it is still ok if it is not implemented. I will just 
call getXXXArray(), getXXXLength and getXXXOffset() and write each field one by 
one if I override Streamable#write(). So it is not a must, do I get it right?

(2) I read your discussion with Stack but still has a question for not having a 
method to do de-serialization (that is, read()) to pair with write(). 
Regarding your comment in HBASE-13754 when explaining your idea why there is 
only write() but no read()
{quote}
I think that is the correct path especially if we add the method to Cell 
interface itself. All the Cell impls to have default constructor. The Codec 
will create a Cell object of its choice and ask the object to make its data 
from the stream. But with the new interface type I thought to just leave it as 
we are not sure whether all Cells will impl the new interface
{quote}
Do you mean that if one needs to do the de-serialization against my new Cell 
implementation - IndividualBytesFieldCell, he/she needs to cut each field from 
stream and then call the constructor to build a new Cell or a new 
IndividualBytesFieldCell? But in this way, he/she needs to read and fully 
understand what I code in IndividualBytesFieldCell#write(). Maybe it does not 
quite follow the principle of encapsulation and not exposing the internal 
implementation? I mean, he/she must know clearly about the internal fields of 
IndividualBytesFieldCell and how I code write(), in order to implement read() 
correctly.



was (Author: water):
Hi [~anoop.hbase], I have some questions, could you please take a look at them 
when you have time?

1. KeyValue implements 5 more interfaces beside Cell, they are HeapSize, 
Cloneable, SettableSequenceId, SettableTimestamp, Streamable(Streamable is only 
in master branch for 2.0.0).
Do I need to implement them for IndividualBytesFieldCell?

I think SettableSequenceId is a must, because I found an error when testing my 
patch without SettableSequenceId implemented.
When HBase is re-started, the following exception can be found in RS log when 
master asks the RS to open a region
{code}
java.io.IOException: java.lang.UnsupportedOperationException: Cell is not of 
type org.apache.hadoop.hbase.SettableSequenceId
        at org.apache.hadoop.hbase.CellUtil.setSequenceId(CellUtil.java:676)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSWALEntry.stampRegionSequenceId(FSWALEntry.java:124)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1853)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1750)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException: Cell is not of type 
org.apache.hadoop.hbase.SettableSequenceId
        ... 9 more
{code}

SettableSequenceId and SettableTimestamp are server-side code to me, but Put is 
client-side code. I am just wondering why the Cell implementation used in 
Put(in client side) are expected to implement some server-side code.

2. About Streamable interface which is added by HBASE-13754.
(1) Do I need to implement Streamable as well? 
Regarding the description of HBASE-13754, you mentioned:
bq. For other cell implementation, we will call getXXXLength() and 
getXXXArray() and write each component one after the other
It can improve the efficiency if I have a single backing array as a whole and 
implement Streamable, but it is still ok if it is not implemented. I will just 
call getXXXArray(), getXXXLength and getXXXOffset() and write each field one by 
one if I override Streamable#write(). So it is not a must, do I get it right?

(2) I read your discussion with Stack but still has a question for not having a 
method to do de-serialization (that is, read()) to pair with write(). 
Regarding your comment in HBASE-13754 when explaining your idea why there is 
only write() but no read()
{quote}
I think that is the correct path especially if we add the method to Cell 
interface itself. All the Cell impls to have default constructor. The Codec 
will create a Cell object of its choice and ask the object to make its data 
from the stream. But with the new interface type I thought to just leave it as 
we are not sure whether all Cells will impl the new interface
{quote}
Do you mean that if one needs to do the de-serialization against my new Cell 
implementation - IndividualBytesFieldCell, he/she needs to cut each field from 
stream and then call the constructor to build a new Cell or a new 
IndividualBytesFieldCell? But in this way, he/she needs to read and fully 
understand what I code in IndividualBytesFieldCell#write(). Does it not quite 
follow the rule of encapsulation and not exposing the internal implementation? 
I mean, he/she must know clearly about the internal fields of 
IndividualBytesFieldCell and how I code write(), in order to implement read() 
correctly.


> Provide a Put API that adds the provided family, qualifier, value without 
> copying
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-14882
>                 URL: https://issues.apache.org/jira/browse/HBASE-14882
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Jerry He
>            Assignee: Xiang Li
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14882.master.000.patch, 
> HBASE-14882.master.001.patch
>
>
> In the Put API, we have addImmutable()
> {code}
>  /**
>    * See {@link #addColumn(byte[], byte[], byte[])}. This version expects
>    * that the underlying arrays won't change. It's intended
>    * for usage internal HBase to and for advanced client applications.
>    */
>   public Put addImmutable(byte [] family, byte [] qualifier, byte [] value)
> {code}
> But in the implementation, the family, qualifier and value are still being 
> copied locally to create kv.
> Hopefully we should provide an API that truly uses immutable family, 
> qualifier and value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying

Reply via email to