Re: ByteBuffer Backed Cell - New APIs (HBASE-12358)

Ted Yu Thu, 04 Dec 2014 20:25:13 -0800

bq. (1) could be preferable as multiple new small objects can be avoided.

Agreed. +1


On Thu, Dec 4, 2014 at 8:03 PM, ramkrishna vasudevan <
[email protected]> wrote:

> >>Is there example of the above usage pattern ?
> Just take cases of Filters and CP where a Cell is exposed to the user in
> the read path and that cell could be having hasArray - false (Cells backed
> by DBB) or true (Cells that are coming from Memstore).
> >>Within HBase core, we can
> make sure the above pattern doesn't exist, right ?
> In HBase core code we could definitely avoid the pattern and infact always
> with getXXXBuffer everywhere (use getXXXOffset and getXXXLength) depends on
> either (1) or (2) approach that we take. (1) could be preferable as
> multiple new small objects can be avoided.
> Also would help in KeyValue type Cells also to be used with getXXXBuffer
> along with getXXXOffset and getXXXLength.
>
> Regards
> Ram
>
>
> On Fri, Dec 5, 2014 at 9:24 AM, Ted Yu <[email protected]> wrote:
>
> > Thanks for the writeup, Ram.
> >
> > This feature is targeting 2.0 release, right ?
> >
> > bq. If one sees hasArray() as false (a DBB backed Cell) and uses the
> > getXXXArray()
> > API along with offset and length
> >
> > Is there example of the above usage pattern ? Within HBase core, we can
> > make sure the above pattern doesn't exist, right ?
> >
> > Cheers
> >
> > On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan <
> > [email protected]> wrote:
> >
> > > Hi Devs
> > >
> > > This write up is to provide a brief idea  on why we need a BB backed
> cell
> > > and what are the items that we need to take care before introducing new
> > > APIs in Cell that are BB backed.
> > >
> > > Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also
> and
> > > its
> > > parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the
> > > history.
> > >
> > > Coming back to the discussion on new APIs, this discussion is based on
> > > supporting BB in the read path (write path is not targeted now) so that
> > we
> > > could work with offheap BBs also. This would avoid copying of data from
> > > BlockCache to the read path ByteBuffer.
> > >
> > > Assume we will be working with BBs in the read path, We will need to
> > >  introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself
> > > directly.
> > > If we try to extend the cell or create a new Cell then *everywhere we
> > need
> > > to do instanceOf check or do type conversion *and that is why adding
> new
> > > APIs to Cell interface itself makes sense.
> > >
> > > Plan is to use this *getXXXBuffer()* API through out the read path
> > *instead
> > > of getXXXArray()*.
> > >
> > > Now there are two ways to use it
> > >
> > > 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like
> how
> > we
> > > use now for getXXXArray() APIs with the offset and length. Doing so
> would
> > > ensure that every where in the filters and CP one has to just replace
> the
> > > getXXXArray() with getXXXBuffer() and continue to use getXXXOffset()
> and
> > > getXXXLength(). We would do some wrapping of the byte[] with a BB
> incase
> > of
> > > KeyValue type of cells so that getXXXBuffer along with offset and
> length
> > > holds true everywhere. Note that here if hasArray is true(for KV case)
> > then
> > > getXXXArray() would also work.
> > >
> > > 2)The other way of using this is that use only getXXXBuffer() API and
> > > ensure that the BB is always duplicated/sliced and only the portion of
> > the
> > > total BB is returned which represents the individual component of the
> > Cell.
> > > In this case there is no use of getXXXOffset() (as it is going to be 0)
> > and
> > > getXXXLength() is any way going to be the sliced BB's limit.
> > >
> > > But in the 2nd approach we may end up in creating lot of small objects
> > even
> > > while doing comparison.
> > >
> > > Now the next problem that comes is what to do with the getXXXArray()
> > APIs.
> > > If one sees hasArray() as false (a DBB backed Cell) and uses the
> > > getXXXArray() API along with offset and length - what should we do.
> > Should
> > > we create a byte[] from the DBB and return it? Then in that case what
> > would
> > > should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?*
> > >
> > > If we go with the 2nd approach then getXXXBuffer() should be clearly
> > > documented saying that it has to be used without getXXXOffset() and
> > > getXXXLength() and use getXXXOffset() and getXXXLength() only with
> > > getXXXArray().
> > >
> > > Now if a Cell is backed by on heap BB then we could definitely return
> > > getXXXArray() also - but what to return in the getXXXOffset() would be
> > > determined by what approach to use for getXXXBuffer(). (based on (1)
> and
> > > (2)).
> > >
> > > We wanted to open up this topic now so that to get some feedback on
> what
> > > could be an option here. Since it is an user facing Interface we need
> to
> > be
> > > careful with this.
> > >
> > > I would suggest that whenever a Cell is *BB backed*(Onheap or offheap)
> > > always *hasArray() would be false* in that Cell impl.
> > >
> > > Every where we would use getXXXBuffer() along with getXXXOffest() and
> > > getXXXLength(). Even in case of KV we could wrap the byte[] with BB so
> > that
> > > we have uniformity through the read code and we don't have too many
> 'if'
> > > else conditions.
> > >
> > > When ever *hasArray() is false* - using getXXXArray() API would throw
> > > *UnSupportedOperation
> > > Exception*.
> > >
> > > As said if we want *getXXXArray()* to be supported as per the existing
> > way
> > > then getXXXBuffer() and getXXXOffset(), getXXXLength() should be
> clearly
> > > documented.
> > >
> > > Thoughts!!!
> > >
> > > Regards
> > > Ram & Anoop
> > >
> >
>

Re: ByteBuffer Backed Cell - New APIs (HBASE-12358)

Reply via email to