[
https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124586#comment-15124586
]
Anoop Sam John commented on HBASE-15180:
----------------------------------------
CellScanner - Ya we made it like normal our scanner way of call advance and
then get current. You mean we should have been doing just getting next Cell
from the Decoder?
bq. If so, we have CellOutputStream, should your CellReadable be a
CellInputStream with read methods that return Cells to mirror the write methods
we have in CellOutputStream. Your CellReadableByteArrayInputStream would become
CellByteArrayInputStream and would implement CellInputStream.
SO u suggest renaming of the interface. That should be fine and looks better.
bq.I've asked this before I know but do we have to flag when tags and when
without? Internally, when we read, the Cell will know if it has tags or not?
To avoid the overhead of parsing tagsLength every time this was done. The old
method we used to read cells from InputStream was
iscreate(final InputStream in, boolean withTags)
So KVCodecWithTags only pass this as true and we make KeyValue instance. By
default we have KVCodec which pass false and we make NoTagsKV on which
getTagsLength is just return 0. But there is still an issue more. When we
copy the Cell to MSLAB before adding to Memstore, we do copy and make a new
KeyValue only. So this NoTagsKV becomes KV there !!! Need handle but another
issue.
Ya as u said, we can avoid passing this boolean and after the Cell is been read
out in new CellInputStream (KeyValue object then), we need to parse tags length
once and see it is 0 or >0 and based on that recreate NoTagsKV if needed. That
is one parsing op extra and one Object creation extra. For that gone with
passing the boolean at that time I believe. What do u say?
{quote}
What is the length in the below?
Cell readCell(int length, boolean withTags) throws IOException;
Do we have to pass this in each time?
{quote}
This was needed because of the way we have this PushbackIS. In Decoder impl,
we wrap the actual IS with this PBIS. The Decoder advance operation, reads one
byte to know whether it is end. And now the remaining 3 bytes and old already
read byte to be considered to get the Cell's total length. Said that the read
of the Cell's length has to happen in the PBIS impl. (there only we know the
old already read single byte).. And the read of the Cell (Construction of
Cell) to be done in CellByteArrayInputStream. That is why we have to pass the
length. I agree it looks ugly... I had no other way.. This PBIS thing was done
to read Cells in WAL decoder properly. In case of reading Cells from byte[]
from RPC, we dont really have such an issue. Let me see some way we can solve
this. May be we need special treatment for both cases.
bq.IPCUtil takes a Configuration? Can we not just read the Configuration on
construction rather than pass this flag per call?
IPCUtil #createCellScanner(final Codec codec, final CompressionCodec
compressor, final byte [] cellBlock) - Used from RpcClient
new method called from RpcServer alone. Yes I dont want this direct Cell
create to happen in client side where we dont have an MSLAB copy. Still as u
said, if we read the conf property and set in IPCUtil and use that while
construction of Stream, I can avoid this passing of boolean. If any chance
the config xml in server is used in the client and that says use MSLAB (Still
we dont have as we dont have Memstore and all there), we will do it wrongly no.
That is I was afraid to do that.
Now any way you suggest add a new config to decide this copy or not rather than
rely on MSLAB. I think ya it is good. Only downside is again a new config! But
ya it looks better.. Let me do that..
> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>
> Key: HBASE-15180
> URL: https://issues.apache.org/jira/browse/HBASE-15180
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
>
>
> In KeyValueDecoder#parseCell (Default Codec decoder) we use
> KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create
> a byte[] of length 4 and read the cell length and then an array of Cell's
> length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created
> on top of a ByteArrayInputStream on top of this. By default in write path, we
> have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell
> bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that
> bytes. So there is no issue if we create Cells over the RPC read byte[]
> directly here in Decoder. No need for 2 byte[] creation and copy for every
> Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells
> directly from it.
> Same Codec path is used in client side also. There better we can avoid this
> direct Cell create and continue to do the copy to smaller byte[]s path. Plan
> to introduce some thing like a CodecContext associated with every Codec
> instance which can say the server/client context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)