Re: ByteBuffer Backed Cell - New APIs (HBASE-12358)

ramkrishna vasudevan Thu, 04 Dec 2014 22:43:52 -0800

We thought about this suggestion too but this comes with a lot of more
confusion (?) - what do you think?
Adding new APIs but similar in behaviour could lead to lot more confusion?
Internally it is more easy.



On Fri, Dec 5, 2014 at 11:10 AM, Jonathan Hsieh <[email protected]> wrote:

> Ram,
>
> Can we essentially do both by creating the new getXxBuffer, and then also
> creating new offset and len apis -- getXxx(Bb|Buffer|Buf|B)Offset and
> getXxx(Bb|Buffer|Buf|B)Length?
>
> The old getXxxArray could use the old getXxxOffset and getXxxLength calls.
> Also we'd deprecate all of these and provide an sliced and diced version
> that would have offset always 0.
>
> This way we wouldn't conflate the Bb and byte[] offsets and lengths.  Also
> we could behind the scenes convert Bbs to byte[] arrays and convert
> byte[]'s in to Bbs while maintaining the same interface.
>
> Jon.
>
>
>
> On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan <
> [email protected]> wrote:
>
> > Hi Devs
> >
> > This write up is to provide a brief idea  on why we need a BB backed cell
> > and what are the items that we need to take care before introducing new
> > APIs in Cell that are BB backed.
> >
> > Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also and
> > its
> > parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the
> > history.
> >
> > Coming back to the discussion on new APIs, this discussion is based on
> > supporting BB in the read path (write path is not targeted now) so that
> we
> > could work with offheap BBs also. This would avoid copying of data from
> > BlockCache to the read path ByteBuffer.
> >
> > Assume we will be working with BBs in the read path, We will need to
> >  introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself
> > directly.
> > If we try to extend the cell or create a new Cell then *everywhere we
> need
> > to do instanceOf check or do type conversion *and that is why adding new
> > APIs to Cell interface itself makes sense.
> >
> > Plan is to use this *getXXXBuffer()* API through out the read path
> *instead
> > of getXXXArray()*.
> >
> > Now there are two ways to use it
> >
> > 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like how
> we
> > use now for getXXXArray() APIs with the offset and length. Doing so would
> > ensure that every where in the filters and CP one has to just replace the
> > getXXXArray() with getXXXBuffer() and continue to use getXXXOffset() and
> > getXXXLength(). We would do some wrapping of the byte[] with a BB incase
> of
> > KeyValue type of cells so that getXXXBuffer along with offset and length
> > holds true everywhere. Note that here if hasArray is true(for KV case)
> then
> > getXXXArray() would also work.
> >
> > 2)The other way of using this is that use only getXXXBuffer() API and
> > ensure that the BB is always duplicated/sliced and only the portion of
> the
> > total BB is returned which represents the individual component of the
> Cell.
> > In this case there is no use of getXXXOffset() (as it is going to be 0)
> and
> > getXXXLength() is any way going to be the sliced BB's limit.
> >
> > But in the 2nd approach we may end up in creating lot of small objects
> even
> > while doing comparison.
> >
> > Now the next problem that comes is what to do with the getXXXArray()
> APIs.
> > If one sees hasArray() as false (a DBB backed Cell) and uses the
> > getXXXArray() API along with offset and length - what should we do.
> Should
> > we create a byte[] from the DBB and return it? Then in that case what
> would
> > should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?*
> >
> > If we go with the 2nd approach then getXXXBuffer() should be clearly
> > documented saying that it has to be used without getXXXOffset() and
> > getXXXLength() and use getXXXOffset() and getXXXLength() only with
> > getXXXArray().
> >
> > Now if a Cell is backed by on heap BB then we could definitely return
> > getXXXArray() also - but what to return in the getXXXOffset() would be
> > determined by what approach to use for getXXXBuffer(). (based on (1) and
> > (2)).
> >
> > We wanted to open up this topic now so that to get some feedback on what
> > could be an option here. Since it is an user facing Interface we need to
> be
> > careful with this.
> >
> > I would suggest that whenever a Cell is *BB backed*(Onheap or offheap)
> > always *hasArray() would be false* in that Cell impl.
> >
> > Every where we would use getXXXBuffer() along with getXXXOffest() and
> > getXXXLength(). Even in case of KV we could wrap the byte[] with BB so
> that
> > we have uniformity through the read code and we don't have too many 'if'
> > else conditions.
> >
> > When ever *hasArray() is false* - using getXXXArray() API would throw
> > *UnSupportedOperation
> > Exception*.
> >
> > As said if we want *getXXXArray()* to be supported as per the existing
> way
> > then getXXXBuffer() and getXXXOffset(), getXXXLength() should be clearly
> > documented.
> >
> > Thoughts!!!
> >
> > Regards
> > Ram & Anoop
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // HBase Tech Lead, Software Engineer, Cloudera
> // [email protected] // @jmhsieh
>

Re: ByteBuffer Backed Cell - New APIs (HBASE-12358)

Reply via email to