For the past couple of months, we have been working through various prototypes for supporting inline storage of tags in cells as persisted on disk. Our goals are to support optional use of tags with minimal changes to core code while also avoiding performance impacts to users who do not use tags.
For background, refer to the comments in https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228 and https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 We have iterated on a couple of prototypes that implement tag awareness in DataBlockEncoders, later as a new type of Codec for Cells. This point is discussed in the above comments in HBASE-8496. We think that tag awareness in Cell Codecs is the right way, but there are some shortcomings with the current interfaces internal to HFile that need to addressed in order to avoid any performance impacts for those who do not want to use inline tags, and that may involve a drastic amount of code change. We can avoid several problems with HFile V2 internals, and backwards compatibility concerns, and allow for working tags support with no performance impact and low risk to all HBase users who do not want tag support, while still allowing for inline tags capabilities in a shipping version of HBase, by introducing this in a new V3 version for HFile. The new V3 version for HFile differs from earlier versions by supporting inline tag storage. This version does not change the HFileBlock format whereas it just serializes and deserializes the Tag information that would be persisted in the HFile. Having HFile V3 would also help to keep Tags optional such that the existing cases where there are no tags are totally unaffected. Also we ensure that we keep the changes outside of the V3 reader and writer minimal. Compatibility would not be a problem with future versions when we go with Cell Codecs. What Codecs used for writing the file will be persisted in the HFile header. Now for files that are either V2 or V3 we will instantiate two default codecs that know to deal with serializations with and without tags. There have been thoughts on an HFile V3 prior, e.g.: https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 We have been working on this and will have a clean patch with good amount of testing in time for 0.96. Although our focus is on performance-neutral persistence of inline cell tags in 0.96 to enable a couple of security coprocessor users, introducing an HFile V3 provides design freedom for some other features and problems too that can be developed through the 0.96 cycle into 0.98. Pls voice your opinion on this so that we can make this clear and may be define the scope of the patch. Also feel free to comment on HBASE-8496 on your thoughts and ideas. Regards Ram