[
https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124875#comment-15124875
]
stack commented on HBASE-15180:
-------------------------------
bq. My plan is to make a Cell aware ByteArrayInputStream which can read Cells
directly from it.
Where do we need this (trying to follow along). In current patch I see it being
used inside in IPCUtils method that returns a CellScanner -- seems odd to use
this new Stream in this method to give to the Codec which then does the
CellScanner Interface.
bq. Plan to introduce some thing like a CodecContext associated with every
Codec instance which can say the server/client context.
Why we need a Context? Don't we currently make a decoder per Cell type and/or
context? Then we keep simple Codec API and any mess parsing is internal to the
Codec implementation?
bq. SO u suggest renaming of the interface. That should be fine and looks
better.
Yeah, I think suggested name is better but, lets spend some time on how this
stuff will be used first.
I remember being here with this Codec stuff and I kept bumping into need for a
CellInputStream but in end was able to make do with CellScanner; that was then
and stuff may be different now.
bq. To avoid the overhead of parsing tagsLength every time this was done.
Yeah. Lets move away from passing these withTags flags in the code base.. When
we decode, we should be able to cheaply figure if tags present or not; lets fix
that rather than pass extra flag all over.
bq. This was needed because of the way we have this PushbackIS.
Shouldn't we pass the length when we create the PBIS derivative?
bq. Now any way you suggest add a new config to decide this copy or not rather
than rely on MSLAB.
Can we ask our environment if we are on the serverside and if so, just do the
non-copy and presume that MSLAB or something else, if MSLAB is off, will assume
ownership of the Cells so we can let go of the buffer? Doing this is a little
more indirect but better I think than having MSLAB reference in RPC.
> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>
> Key: HBASE-15180
> URL: https://issues.apache.org/jira/browse/HBASE-15180
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
>
>
> In KeyValueDecoder#parseCell (Default Codec decoder) we use
> KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create
> a byte[] of length 4 and read the cell length and then an array of Cell's
> length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created
> on top of a ByteArrayInputStream on top of this. By default in write path, we
> have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell
> bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that
> bytes. So there is no issue if we create Cells over the RPC read byte[]
> directly here in Decoder. No need for 2 byte[] creation and copy for every
> Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells
> directly from it.
> Same Codec path is used in client side also. There better we can avoid this
> direct Cell create and continue to do the copy to smaller byte[]s path. Plan
> to introduce some thing like a CodecContext associated with every Codec
> instance which can say the server/client context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)