[ 
https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708228#comment-13708228
 ] 

ramkrishna.s.vasudevan commented on HBASE-8496:
-----------------------------------------------

I would like to get your suggestions on this.
-> Ideal soln would be to use the codecs to work with Tags.  But the current 
apis in Codec are more suitable for WAL and RPC but not for HFile level 
implementation.  Atleast they are not straightforward. The HFile scan involves 
positional seek and we tend not to read KV by KV but most of the time we try to 
keep positioning the byte buffer and form the KVs. 
Usage of codec would not allow us to do this because doing a Codec.advance() 
would allow us to advance per KV.  Also this type of advance() is not suitable 
for positional Seek.  Hence it either makes us to introduce apis like 
previous() or backward() (for seekbefore)in the CellScanner or Create a Seeker 
interface in the Decoder (like the Seeker in DatablockEncoders) and implement 
the postional seeks in it.  Doing this type of positional seek in the Util 
classes(discussion with Stack) am bit reluctant on this.
-> Another problem when we use Codec would be with the DatablockEncoders.  How 
does the DatablockEncoders work now is
HFileWriter->append(kv) -> form Hfileblock byte buffer->Encoders read the 
bytebuffer-> Encode per kv into new bytebuffer-> The new bytebuffer is 
persisted.
Read flow
==========
Read the encoded byte buffer-> The Seekers in the DatablockEncoders decode the 
bytebuffer to form the actual bytebuffer

When we try to use Codec we may want to modify this as
HFileWriter->Codec.encode(kv)-> form hfileblock byte buffer -> 
Codec.decode(Bytebuffer) form KVs-> Encode per KV into new byte buffer -> [If 
this KV has tag we may need to again have an encoder here for tags]-> The new 
bytebuffer is persisted   (1)
Read flow
========
REad the encoded byte buffer -> The Seekers in the DatablockEncoders decode the 
bytebuffer to form the actual bytebuffer.

One thing to be noted is that we may have to rewrite all the Encoding algo to 
work with Tags by either subclassing the actual ones or rewriting new ones.  
Now how can this decision be made? Here again we have few options options
-> If user has tags add new Encoding Algos to the DataBlockEncoding enum like 
PRefixKeyDeltaEncodingWithTags, FastDiffkeyEncodingWithTags etc. and when we 
ever we see that the codec used for hfile has the ability to understand tags we 
just use the new Algos.
-> The other way could be let internally the code instantiate the new classes 
and work with them to use the Tags also. But this would involve changes in the 
code with some if/else checks and this would apply for every algorithm.  
Tomorrow if a new codec is added then we may have to keep doing this.
-> Another thing that Anoop suggested was, have a new HFileCodec internally it 
will be having the HFileCompressedEncoder.  And every time you add a new type 
of codec it is upto the user to implement the Prefixkey, Fastdiff, Diffkey, 
PrefixTree to work with that codec. 

One more thing would be to change the way DAtablockEncoders work.  As you can 
see in [1] since the blockencoders work on the Hfileblocks we are not able to 
make the most of the codec way of encoding and decoding. So we could make it 
work on per KV  in the sense
HFileWriter->append(kv)->Codec.encode(kv)-> Create the encoded buffer-> Fill in 
the buffer till the block size is reached.

As you can see all the above changes are like having an impact on the core code 
and we need good amout of changes to do this.  Considering the effort on 0.96 
this would be a major effort.  
One suggestion that we would like to make is and also reading Stack's earlier 
comment HfileV3 would be a viable soln.
So HFileV3 would be the one which would know about the Tags and the read and 
write path in HFileV3 would understand tags.  This would also mean that the 
datablockencoder code path will have some ugly if/else checks to handle the 
code flow with and without Tags (or something similar). I think this would make 
us have Tag support in 0.96 code base and the same could be changed based on 
discussion in community and bring about the changes for 0.98 with codec and 
also make the code talk in terms of Cells.
I can raise a discussion/voting on the dev list for this.  It would be great if 
we can come up with a consensus on this.
  
                
> Implement tags and the internals of how a tag should look like
> --------------------------------------------------------------
>
>                 Key: HBASE-8496
>                 URL: https://issues.apache.org/jira/browse/HBASE-8496
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.98.0
>
>         Attachments: Tag design.pdf
>
>
> The intent of this JIRA comes from HBASE-7897.
> This would help us to decide on the structure and format of how the tags 
> should look like. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to