We thought about this suggestion too but this comes with a lot of more confusion (?) - what do you think? Adding new APIs but similar in behaviour could lead to lot more confusion? Internally it is more easy.
On Fri, Dec 5, 2014 at 11:10 AM, Jonathan Hsieh <[email protected]> wrote: > Ram, > > Can we essentially do both by creating the new getXxBuffer, and then also > creating new offset and len apis -- getXxx(Bb|Buffer|Buf|B)Offset and > getXxx(Bb|Buffer|Buf|B)Length? > > The old getXxxArray could use the old getXxxOffset and getXxxLength calls. > Also we'd deprecate all of these and provide an sliced and diced version > that would have offset always 0. > > This way we wouldn't conflate the Bb and byte[] offsets and lengths. Also > we could behind the scenes convert Bbs to byte[] arrays and convert > byte[]'s in to Bbs while maintaining the same interface. > > Jon. > > > > On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan < > [email protected]> wrote: > > > Hi Devs > > > > This write up is to provide a brief idea on why we need a BB backed cell > > and what are the items that we need to take care before introducing new > > APIs in Cell that are BB backed. > > > > Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also and > > its > > parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the > > history. > > > > Coming back to the discussion on new APIs, this discussion is based on > > supporting BB in the read path (write path is not targeted now) so that > we > > could work with offheap BBs also. This would avoid copying of data from > > BlockCache to the read path ByteBuffer. > > > > Assume we will be working with BBs in the read path, We will need to > > introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself > > directly. > > If we try to extend the cell or create a new Cell then *everywhere we > need > > to do instanceOf check or do type conversion *and that is why adding new > > APIs to Cell interface itself makes sense. > > > > Plan is to use this *getXXXBuffer()* API through out the read path > *instead > > of getXXXArray()*. > > > > Now there are two ways to use it > > > > 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like how > we > > use now for getXXXArray() APIs with the offset and length. Doing so would > > ensure that every where in the filters and CP one has to just replace the > > getXXXArray() with getXXXBuffer() and continue to use getXXXOffset() and > > getXXXLength(). We would do some wrapping of the byte[] with a BB incase > of > > KeyValue type of cells so that getXXXBuffer along with offset and length > > holds true everywhere. Note that here if hasArray is true(for KV case) > then > > getXXXArray() would also work. > > > > 2)The other way of using this is that use only getXXXBuffer() API and > > ensure that the BB is always duplicated/sliced and only the portion of > the > > total BB is returned which represents the individual component of the > Cell. > > In this case there is no use of getXXXOffset() (as it is going to be 0) > and > > getXXXLength() is any way going to be the sliced BB's limit. > > > > But in the 2nd approach we may end up in creating lot of small objects > even > > while doing comparison. > > > > Now the next problem that comes is what to do with the getXXXArray() > APIs. > > If one sees hasArray() as false (a DBB backed Cell) and uses the > > getXXXArray() API along with offset and length - what should we do. > Should > > we create a byte[] from the DBB and return it? Then in that case what > would > > should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?* > > > > If we go with the 2nd approach then getXXXBuffer() should be clearly > > documented saying that it has to be used without getXXXOffset() and > > getXXXLength() and use getXXXOffset() and getXXXLength() only with > > getXXXArray(). > > > > Now if a Cell is backed by on heap BB then we could definitely return > > getXXXArray() also - but what to return in the getXXXOffset() would be > > determined by what approach to use for getXXXBuffer(). (based on (1) and > > (2)). > > > > We wanted to open up this topic now so that to get some feedback on what > > could be an option here. Since it is an user facing Interface we need to > be > > careful with this. > > > > I would suggest that whenever a Cell is *BB backed*(Onheap or offheap) > > always *hasArray() would be false* in that Cell impl. > > > > Every where we would use getXXXBuffer() along with getXXXOffest() and > > getXXXLength(). Even in case of KV we could wrap the byte[] with BB so > that > > we have uniformity through the read code and we don't have too many 'if' > > else conditions. > > > > When ever *hasArray() is false* - using getXXXArray() API would throw > > *UnSupportedOperation > > Exception*. > > > > As said if we want *getXXXArray()* to be supported as per the existing > way > > then getXXXBuffer() and getXXXOffset(), getXXXLength() should be clearly > > documented. > > > > Thoughts!!! > > > > Regards > > Ram & Anoop > > > > > > -- > // Jonathan Hsieh (shay) > // HBase Tech Lead, Software Engineer, Cloudera > // [email protected] // @jmhsieh >
