Thanks for the writeup, Ram. This feature is targeting 2.0 release, right ?
bq. If one sees hasArray() as false (a DBB backed Cell) and uses the getXXXArray() API along with offset and length Is there example of the above usage pattern ? Within HBase core, we can make sure the above pattern doesn't exist, right ? Cheers On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan < [email protected]> wrote: > Hi Devs > > This write up is to provide a brief idea on why we need a BB backed cell > and what are the items that we need to take care before introducing new > APIs in Cell that are BB backed. > > Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also and > its > parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the > history. > > Coming back to the discussion on new APIs, this discussion is based on > supporting BB in the read path (write path is not targeted now) so that we > could work with offheap BBs also. This would avoid copying of data from > BlockCache to the read path ByteBuffer. > > Assume we will be working with BBs in the read path, We will need to > introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself > directly. > If we try to extend the cell or create a new Cell then *everywhere we need > to do instanceOf check or do type conversion *and that is why adding new > APIs to Cell interface itself makes sense. > > Plan is to use this *getXXXBuffer()* API through out the read path *instead > of getXXXArray()*. > > Now there are two ways to use it > > 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like how we > use now for getXXXArray() APIs with the offset and length. Doing so would > ensure that every where in the filters and CP one has to just replace the > getXXXArray() with getXXXBuffer() and continue to use getXXXOffset() and > getXXXLength(). We would do some wrapping of the byte[] with a BB incase of > KeyValue type of cells so that getXXXBuffer along with offset and length > holds true everywhere. Note that here if hasArray is true(for KV case) then > getXXXArray() would also work. > > 2)The other way of using this is that use only getXXXBuffer() API and > ensure that the BB is always duplicated/sliced and only the portion of the > total BB is returned which represents the individual component of the Cell. > In this case there is no use of getXXXOffset() (as it is going to be 0) and > getXXXLength() is any way going to be the sliced BB's limit. > > But in the 2nd approach we may end up in creating lot of small objects even > while doing comparison. > > Now the next problem that comes is what to do with the getXXXArray() APIs. > If one sees hasArray() as false (a DBB backed Cell) and uses the > getXXXArray() API along with offset and length - what should we do. Should > we create a byte[] from the DBB and return it? Then in that case what would > should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?* > > If we go with the 2nd approach then getXXXBuffer() should be clearly > documented saying that it has to be used without getXXXOffset() and > getXXXLength() and use getXXXOffset() and getXXXLength() only with > getXXXArray(). > > Now if a Cell is backed by on heap BB then we could definitely return > getXXXArray() also - but what to return in the getXXXOffset() would be > determined by what approach to use for getXXXBuffer(). (based on (1) and > (2)). > > We wanted to open up this topic now so that to get some feedback on what > could be an option here. Since it is an user facing Interface we need to be > careful with this. > > I would suggest that whenever a Cell is *BB backed*(Onheap or offheap) > always *hasArray() would be false* in that Cell impl. > > Every where we would use getXXXBuffer() along with getXXXOffest() and > getXXXLength(). Even in case of KV we could wrap the byte[] with BB so that > we have uniformity through the read code and we don't have too many 'if' > else conditions. > > When ever *hasArray() is false* - using getXXXArray() API would throw > *UnSupportedOperation > Exception*. > > As said if we want *getXXXArray()* to be supported as per the existing way > then getXXXBuffer() and getXXXOffset(), getXXXLength() should be clearly > documented. > > Thoughts!!! > > Regards > Ram & Anoop >
