[ 
https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124586#comment-15124586
 ] 

Anoop Sam John commented on HBASE-15180:
----------------------------------------

CellScanner - Ya we made it like normal our scanner way of call advance and 
then get current.  You mean we should have been doing just getting next Cell 
from the Decoder?  
bq. If so, we have CellOutputStream, should your CellReadable be a 
CellInputStream with read methods that return Cells to mirror the write methods 
we have in CellOutputStream. Your CellReadableByteArrayInputStream would become 
CellByteArrayInputStream and would implement CellInputStream.
SO u suggest renaming of the interface.  That should be fine and looks better.

bq.I've asked this before I know but do we have to flag when tags and when 
without? Internally, when we read, the Cell will know if it has tags or not?
To avoid the overhead of parsing tagsLength every time this was done.  The old 
method we used to read cells from InputStream was
iscreate(final InputStream in, boolean withTags)
So KVCodecWithTags only pass this as true and we make KeyValue instance.  By 
default we have KVCodec which pass false and we make NoTagsKV on which 
getTagsLength is just return 0.  But there is still an issue more.  When we 
copy the Cell to MSLAB before adding to Memstore, we do copy and make a new 
KeyValue only. So this NoTagsKV becomes KV there !!!    Need handle but another 
issue.
Ya as u said, we can avoid passing this boolean and after the Cell is been read 
out in new CellInputStream (KeyValue object then), we need to parse tags length 
once and see it is 0 or >0 and based on that recreate NoTagsKV if needed.  That 
is one parsing op extra and one Object creation extra. For that gone with 
passing the boolean at that time I believe.  What do u say?

{quote}
What is the length in the below?
Cell readCell(int length, boolean withTags) throws IOException;
Do we have to pass this in each time?
{quote}
This was needed because of the way we have this PushbackIS.  In Decoder impl, 
we wrap the actual IS with this PBIS.  The Decoder advance operation, reads one 
byte to know whether it is end.  And now the remaining 3 bytes and old already 
read byte to be considered to get the Cell's total length.  Said that the read 
of the Cell's length has to happen in the PBIS impl.  (there only we know the 
old already read single byte)..  And the read of the Cell (Construction of 
Cell) to be done in CellByteArrayInputStream.  That is why we have to pass the 
length. I agree it looks ugly... I had no other way..  This PBIS thing was done 
to read Cells in WAL decoder properly.   In case of reading Cells from byte[] 
from RPC, we dont really have such an issue.   Let me see some way we can solve 
this. May be we need special treatment for both cases.

bq.IPCUtil takes a Configuration? Can we not just read the Configuration on 
construction rather than pass this flag per call?
IPCUtil #createCellScanner(final Codec codec, final CompressionCodec 
compressor,  final byte [] cellBlock)  -  Used from RpcClient
new method called from RpcServer alone.  Yes I dont want this direct Cell 
create to happen in client side where we dont have an MSLAB copy.   Still as u 
said, if we read the conf property and set in IPCUtil and use that while 
construction of Stream, I can avoid this passing of boolean.   If any chance 
the config xml in server is used in the client and that says use MSLAB (Still 
we dont have as we dont have Memstore and all there), we will do it wrongly no. 
  That is I was afraid to do that.  

Now any way you suggest add a new config to decide this copy or not rather than 
rely on MSLAB. I think ya it is good. Only downside is again a new config!  But 
ya it looks better..  Let me do that.. 

> Reduce garbage created while reading Cells from Codec Decoder
> -------------------------------------------------------------
>
>                 Key: HBASE-15180
>                 URL: https://issues.apache.org/jira/browse/HBASE-15180
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15180.patch, HBASE-15180_V2.patch
>
>
> In KeyValueDecoder#parseCell (Default Codec decoder) we use 
> KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create 
> a byte[] of length 4 and read the cell length and then an array of Cell's 
> length and read in cell bytes into it and create a KV.
> Actually in server we read the reqs into a byte[] and CellScanner is created 
> on top of a ByteArrayInputStream on top of this. By default in write path, we 
> have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell 
> bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that 
> bytes.  So there is no issue if we create Cells over the RPC read byte[] 
> directly here in Decoder.  No need for 2 byte[] creation and copy for every 
> Cell in request.
> My plan is to make a Cell aware ByteArrayInputStream which can read Cells 
> directly from it.  
> Same Codec path is used in client side also. There better we can avoid this 
> direct Cell create and continue to do the copy to smaller byte[]s path.  Plan 
> to introduce some thing like a CodecContext associated with every Codec 
> instance which can say the server/client context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to