[ 
https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124875#comment-15124875
 ] 

stack commented on HBASE-15180:
-------------------------------



bq. My plan is to make a Cell aware ByteArrayInputStream which can read Cells 
directly from it.

Where do we need this (trying to follow along). In current patch I see it being 
used inside in IPCUtils method that returns a CellScanner -- seems odd to use 
this new Stream in this method to give to the Codec which then does the 
CellScanner Interface.

bq. Plan to introduce some thing like a CodecContext associated with every 
Codec instance which can say the server/client context.

Why we need a Context? Don't we currently make a decoder per Cell type and/or 
context? Then we keep simple Codec API and any mess parsing is internal to the 
Codec implementation?

bq. SO u suggest renaming of the interface. That should be fine and looks 
better.

Yeah, I think suggested name is better but, lets spend some time on how this 
stuff will be used first.

I remember being here with this Codec stuff and I kept bumping into need for a 
CellInputStream but in end was able to make do with CellScanner; that was then 
and stuff may be different now.

bq. To avoid the overhead of parsing tagsLength every time this was done.

Yeah. Lets move away from passing these withTags flags in the code base.. When 
we decode, we should be able to cheaply figure if tags present or not; lets fix 
that rather than pass extra flag all over.

bq. This was needed because of the way we have this PushbackIS. 

Shouldn't we pass the length when we create the PBIS derivative?

bq. Now any way you suggest add a new config to decide this copy or not rather 
than rely on MSLAB. 

Can we ask our environment if we are on the serverside and if so, just do the 
non-copy and presume that MSLAB or something else, if MSLAB is off, will assume 
ownership of the Cells so we can let go of the buffer?  Doing this is a little 
more indirect but better I think than having MSLAB reference in RPC.


> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>
>                 Key: HBASE-15180
>                 URL: https://issues.apache.org/jira/browse/HBASE-15180
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
>
>
> In KeyValueDecoder#parseCell (Default Codec decoder) we use 
> KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create 
> a byte[] of length 4 and read the cell length and then an array of Cell's 
> length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created 
> on top of a ByteArrayInputStream on top of this. By default in write path, we 
> have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell 
> bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that 
> bytes.  So there is no issue if we create Cells over the RPC read byte[] 
> directly here in Decoder.  No need for 2 byte[] creation and copy for every 
> Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells 
> directly from it.  
> Same Codec path is used in client side also. There better we can avoid this 
> direct Cell create and continue to do the copy to smaller byte[]s path.  Plan 
> to introduce some thing like a CodecContext associated with every Codec 
> instance which can say the server/client context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to