[ https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15476403#comment-15476403 ]
Xiang Li edited comment on HBASE-14882 at 9/9/16 3:54 PM: ---------------------------------------------------------- Hi [~anoop.hbase], I have some questions, could you please take a look at them when you have time? 1. KeyValue implements 5 more interfaces beside Cell, they are HeapSize, Cloneable, SettableSequenceId, SettableTimestamp, Streamable(Streamable is only in master branch for 2.0.0). Do I need to implement them for IndividualBytesFieldCell? I think SettableSequenceId is a must, because I found an error when testing my patch without SettableSequenceId implemented. When HBase is re-started, the following exception can be found in RS log when master asks the RS to open a region {code} java.io.IOException: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId at org.apache.hadoop.hbase.CellUtil.setSequenceId(CellUtil.java:676) at org.apache.hadoop.hbase.regionserver.wal.FSWALEntry.stampRegionSequenceId(FSWALEntry.java:124) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1853) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1750) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId ... 9 more {code} SettableSequenceId and SettableTimestamp are server-side code to me, but Put is client-side code. I am just wondering why the Cell implementation used in Put(in client side) are expected to implement some server-side code. 2. About Streamable interface which is added by HBASE-13754. (1) Do I need to implement Streamable as well? Regarding the description of HBASE-13754, you mentioned: bq. For other cell implementation, we will call getXXXLength() and getXXXArray() and write each component one after the other It can improve the efficiency if I have a single backing array as a whole and implement Streamable, but it is still ok if it is not implemented. I will just call getXXXArray(), getXXXLength and getXXXOffset() and write each field one by one if I override Streamable#write(). So it is not a must, do I get it right? (2) I read your discussion with Stack but still has a question for not having a method to do de-serialization (that is, read()) to pair with write(). Regarding your comment in HBASE-13754 when explaining your idea why there is only write() but no read() {quote} I think that is the correct path especially if we add the method to Cell interface itself. All the Cell impls to have default constructor. The Codec will create a Cell object of its choice and ask the object to make its data from the stream. But with the new interface type I thought to just leave it as we are not sure whether all Cells will impl the new interface {quote} Do you mean that if one needs to do the de-serialization against my new Cell implementation - IndividualBytesFieldCell, he/she needs to cut each field from stream and then call the constructor to build a new Cell or a new IndividualBytesFieldCell? But in this way, he/she needs to read and fully understand what I code in IndividualBytesFieldCell#write(). Maybe it does not quite follow the principle of encapsulation and not exposing the internal implementation? I mean, he/she must know clearly about the internal fields of IndividualBytesFieldCell and how I code write(), in order to implement read() correctly. was (Author: water): Hi [~anoop.hbase], I have some questions, could you please take a look at them when you have time? 1. KeyValue implements 5 more interfaces beside Cell, they are HeapSize, Cloneable, SettableSequenceId, SettableTimestamp, Streamable(Streamable is only in master branch for 2.0.0). Do I need to implement them for IndividualBytesFieldCell? I think SettableSequenceId is a must, because I found an error when testing my patch without SettableSequenceId implemented. When HBase is re-started, the following exception can be found in RS log when master asks the RS to open a region {code} java.io.IOException: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId at org.apache.hadoop.hbase.CellUtil.setSequenceId(CellUtil.java:676) at org.apache.hadoop.hbase.regionserver.wal.FSWALEntry.stampRegionSequenceId(FSWALEntry.java:124) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1853) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1750) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId ... 9 more {code} SettableSequenceId and SettableTimestamp are server-side code to me, but Put is client-side code. I am just wondering why the Cell implementation used in Put(in client side) are expected to implement some server-side code. 2. About Streamable interface which is added by HBASE-13754. (1) Do I need to implement Streamable as well? Regarding the description of HBASE-13754, you mentioned: bq. For other cell implementation, we will call getXXXLength() and getXXXArray() and write each component one after the other It can improve the efficiency if I have a single backing array as a whole and implement Streamable, but it is still ok if it is not implemented. I will just call getXXXArray(), getXXXLength and getXXXOffset() and write each field one by one if I override Streamable#write(). So it is not a must, do I get it right? (2) I read your discussion with Stack but still has a question for not having a method to do de-serialization (that is, read()) to pair with write(). Regarding your comment in HBASE-13754 when explaining your idea why there is only write() but no read() {quote} I think that is the correct path especially if we add the method to Cell interface itself. All the Cell impls to have default constructor. The Codec will create a Cell object of its choice and ask the object to make its data from the stream. But with the new interface type I thought to just leave it as we are not sure whether all Cells will impl the new interface {quote} Do you mean that if one needs to do the de-serialization against my new Cell implementation - IndividualBytesFieldCell, he/she needs to cut each field from stream and then call the constructor to build a new Cell or a new IndividualBytesFieldCell? But in this way, he/she needs to read and fully understand what I code in IndividualBytesFieldCell#write(). Does it not quite follow the rule of encapsulation and not exposing the internal implementation? I mean, he/she must know clearly about the internal fields of IndividualBytesFieldCell and how I code write(), in order to implement read() correctly. > Provide a Put API that adds the provided family, qualifier, value without > copying > --------------------------------------------------------------------------------- > > Key: HBASE-14882 > URL: https://issues.apache.org/jira/browse/HBASE-14882 > Project: HBase > Issue Type: Improvement > Affects Versions: 1.2.0 > Reporter: Jerry He > Assignee: Xiang Li > Fix For: 2.0.0 > > Attachments: HBASE-14882.master.000.patch, > HBASE-14882.master.001.patch > > > In the Put API, we have addImmutable() > {code} > /** > * See {@link #addColumn(byte[], byte[], byte[])}. This version expects > * that the underlying arrays won't change. It's intended > * for usage internal HBase to and for advanced client applications. > */ > public Put addImmutable(byte [] family, byte [] qualifier, byte [] value) > {code} > But in the implementation, the family, qualifier and value are still being > copied locally to create kv. > Hopefully we should provide an API that truly uses immutable family, > qualifier and value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)