bq. (1) could be preferable as multiple new small objects can be avoided. Agreed. +1
On Thu, Dec 4, 2014 at 8:03 PM, ramkrishna vasudevan < [email protected]> wrote: > >>Is there example of the above usage pattern ? > Just take cases of Filters and CP where a Cell is exposed to the user in > the read path and that cell could be having hasArray - false (Cells backed > by DBB) or true (Cells that are coming from Memstore). > >>Within HBase core, we can > make sure the above pattern doesn't exist, right ? > In HBase core code we could definitely avoid the pattern and infact always > with getXXXBuffer everywhere (use getXXXOffset and getXXXLength) depends on > either (1) or (2) approach that we take. (1) could be preferable as > multiple new small objects can be avoided. > Also would help in KeyValue type Cells also to be used with getXXXBuffer > along with getXXXOffset and getXXXLength. > > Regards > Ram > > > On Fri, Dec 5, 2014 at 9:24 AM, Ted Yu <[email protected]> wrote: > > > Thanks for the writeup, Ram. > > > > This feature is targeting 2.0 release, right ? > > > > bq. If one sees hasArray() as false (a DBB backed Cell) and uses the > > getXXXArray() > > API along with offset and length > > > > Is there example of the above usage pattern ? Within HBase core, we can > > make sure the above pattern doesn't exist, right ? > > > > Cheers > > > > On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan < > > [email protected]> wrote: > > > > > Hi Devs > > > > > > This write up is to provide a brief idea on why we need a BB backed > cell > > > and what are the items that we need to take care before introducing new > > > APIs in Cell that are BB backed. > > > > > > Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also > and > > > its > > > parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the > > > history. > > > > > > Coming back to the discussion on new APIs, this discussion is based on > > > supporting BB in the read path (write path is not targeted now) so that > > we > > > could work with offheap BBs also. This would avoid copying of data from > > > BlockCache to the read path ByteBuffer. > > > > > > Assume we will be working with BBs in the read path, We will need to > > > introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself > > > directly. > > > If we try to extend the cell or create a new Cell then *everywhere we > > need > > > to do instanceOf check or do type conversion *and that is why adding > new > > > APIs to Cell interface itself makes sense. > > > > > > Plan is to use this *getXXXBuffer()* API through out the read path > > *instead > > > of getXXXArray()*. > > > > > > Now there are two ways to use it > > > > > > 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like > how > > we > > > use now for getXXXArray() APIs with the offset and length. Doing so > would > > > ensure that every where in the filters and CP one has to just replace > the > > > getXXXArray() with getXXXBuffer() and continue to use getXXXOffset() > and > > > getXXXLength(). We would do some wrapping of the byte[] with a BB > incase > > of > > > KeyValue type of cells so that getXXXBuffer along with offset and > length > > > holds true everywhere. Note that here if hasArray is true(for KV case) > > then > > > getXXXArray() would also work. > > > > > > 2)The other way of using this is that use only getXXXBuffer() API and > > > ensure that the BB is always duplicated/sliced and only the portion of > > the > > > total BB is returned which represents the individual component of the > > Cell. > > > In this case there is no use of getXXXOffset() (as it is going to be 0) > > and > > > getXXXLength() is any way going to be the sliced BB's limit. > > > > > > But in the 2nd approach we may end up in creating lot of small objects > > even > > > while doing comparison. > > > > > > Now the next problem that comes is what to do with the getXXXArray() > > APIs. > > > If one sees hasArray() as false (a DBB backed Cell) and uses the > > > getXXXArray() API along with offset and length - what should we do. > > Should > > > we create a byte[] from the DBB and return it? Then in that case what > > would > > > should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?* > > > > > > If we go with the 2nd approach then getXXXBuffer() should be clearly > > > documented saying that it has to be used without getXXXOffset() and > > > getXXXLength() and use getXXXOffset() and getXXXLength() only with > > > getXXXArray(). > > > > > > Now if a Cell is backed by on heap BB then we could definitely return > > > getXXXArray() also - but what to return in the getXXXOffset() would be > > > determined by what approach to use for getXXXBuffer(). (based on (1) > and > > > (2)). > > > > > > We wanted to open up this topic now so that to get some feedback on > what > > > could be an option here. Since it is an user facing Interface we need > to > > be > > > careful with this. > > > > > > I would suggest that whenever a Cell is *BB backed*(Onheap or offheap) > > > always *hasArray() would be false* in that Cell impl. > > > > > > Every where we would use getXXXBuffer() along with getXXXOffest() and > > > getXXXLength(). Even in case of KV we could wrap the byte[] with BB so > > that > > > we have uniformity through the read code and we don't have too many > 'if' > > > else conditions. > > > > > > When ever *hasArray() is false* - using getXXXArray() API would throw > > > *UnSupportedOperation > > > Exception*. > > > > > > As said if we want *getXXXArray()* to be supported as per the existing > > way > > > then getXXXBuffer() and getXXXOffset(), getXXXLength() should be > clearly > > > documented. > > > > > > Thoughts!!! > > > > > > Regards > > > Ram & Anoop > > > > > >
