[ https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708228#comment-13708228 ]
ramkrishna.s.vasudevan commented on HBASE-8496: ----------------------------------------------- I would like to get your suggestions on this. -> Ideal soln would be to use the codecs to work with Tags. But the current apis in Codec are more suitable for WAL and RPC but not for HFile level implementation. Atleast they are not straightforward. The HFile scan involves positional seek and we tend not to read KV by KV but most of the time we try to keep positioning the byte buffer and form the KVs. Usage of codec would not allow us to do this because doing a Codec.advance() would allow us to advance per KV. Also this type of advance() is not suitable for positional Seek. Hence it either makes us to introduce apis like previous() or backward() (for seekbefore)in the CellScanner or Create a Seeker interface in the Decoder (like the Seeker in DatablockEncoders) and implement the postional seeks in it. Doing this type of positional seek in the Util classes(discussion with Stack) am bit reluctant on this. -> Another problem when we use Codec would be with the DatablockEncoders. How does the DatablockEncoders work now is HFileWriter->append(kv) -> form Hfileblock byte buffer->Encoders read the bytebuffer-> Encode per kv into new bytebuffer-> The new bytebuffer is persisted. Read flow ========== Read the encoded byte buffer-> The Seekers in the DatablockEncoders decode the bytebuffer to form the actual bytebuffer When we try to use Codec we may want to modify this as HFileWriter->Codec.encode(kv)-> form hfileblock byte buffer -> Codec.decode(Bytebuffer) form KVs-> Encode per KV into new byte buffer -> [If this KV has tag we may need to again have an encoder here for tags]-> The new bytebuffer is persisted (1) Read flow ======== REad the encoded byte buffer -> The Seekers in the DatablockEncoders decode the bytebuffer to form the actual bytebuffer. One thing to be noted is that we may have to rewrite all the Encoding algo to work with Tags by either subclassing the actual ones or rewriting new ones. Now how can this decision be made? Here again we have few options options -> If user has tags add new Encoding Algos to the DataBlockEncoding enum like PRefixKeyDeltaEncodingWithTags, FastDiffkeyEncodingWithTags etc. and when we ever we see that the codec used for hfile has the ability to understand tags we just use the new Algos. -> The other way could be let internally the code instantiate the new classes and work with them to use the Tags also. But this would involve changes in the code with some if/else checks and this would apply for every algorithm. Tomorrow if a new codec is added then we may have to keep doing this. -> Another thing that Anoop suggested was, have a new HFileCodec internally it will be having the HFileCompressedEncoder. And every time you add a new type of codec it is upto the user to implement the Prefixkey, Fastdiff, Diffkey, PrefixTree to work with that codec. One more thing would be to change the way DAtablockEncoders work. As you can see in [1] since the blockencoders work on the Hfileblocks we are not able to make the most of the codec way of encoding and decoding. So we could make it work on per KV in the sense HFileWriter->append(kv)->Codec.encode(kv)-> Create the encoded buffer-> Fill in the buffer till the block size is reached. As you can see all the above changes are like having an impact on the core code and we need good amout of changes to do this. Considering the effort on 0.96 this would be a major effort. One suggestion that we would like to make is and also reading Stack's earlier comment HfileV3 would be a viable soln. So HFileV3 would be the one which would know about the Tags and the read and write path in HFileV3 would understand tags. This would also mean that the datablockencoder code path will have some ugly if/else checks to handle the code flow with and without Tags (or something similar). I think this would make us have Tag support in 0.96 code base and the same could be changed based on discussion in community and bring about the changes for 0.98 with codec and also make the code talk in terms of Cells. I can raise a discussion/voting on the dev list for this. It would be great if we can come up with a consensus on this. > Implement tags and the internals of how a tag should look like > -------------------------------------------------------------- > > Key: HBASE-8496 > URL: https://issues.apache.org/jira/browse/HBASE-8496 > Project: HBase > Issue Type: New Feature > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.98.0 > > Attachments: Tag design.pdf > > > The intent of this JIRA comes from HBASE-7897. > This would help us to decide on the structure and format of how the tags > should look like. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira