[ 
https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675541#comment-13675541
 ] 

ramkrishna.s.vasudevan commented on HBASE-8496:
-----------------------------------------------

The strucuture of tag may look like this
{color:red}
<1 byte type code><2 byte tag length><tag>
{color}

We need to provide some TagIterators inside the CellUtil so that we will be 
able to iterate the tag array.
The Iterator must use the above tag structure to build this info.
Other utility methods may also be needed for this like getNumTags(), given a 
type get the tags of that type etc.
If we are having the structure with the type in it then it may not be possible 
to actually have some validation on the client side for specific tag types.
The reason for having type is to have different usecases for tags and the CP 
that we add for the different usecase should help us in achieving it.

We also need to identify different usecases for tags other than Visibility and 
ACLs so that we can ensure that we provide proper client support for tags.  
Currently the idea is to go with the CP based approach.
>From the client perspective the tags will now be added as part of Puts?
Put.add(KeyValue) will now have an option to pass tag array. One more option 
that we thought of is to have OperationAttributes and set the tags over there.
Tried out different options on getting Tags working with the KeyValues and 
existing formats.
The KV can be modified to 
{color:red}
<keylength><valuelength><keyarray><valuearray><taglength><tagarray>
{color}
So if a kv does not have any tags still the taglength will be 0 but there will 
not be any tag array.  
This will involve some changes in the format of the HFileWriter and reader 
probably a new version of the Writer/Reader is needed.  (Minor should be 
enough?)
Incase of encoders the base encoder BufferedDataEncoder will be tag aware and 
currently there is not encoding logic applies on the tag part.  It is just 
written and parsed so that while scan we are able to get the tags in the output 
KVs.

Similar applies for the PrefixTree codec.  In this case the backward 
compatability should be taken care of.

Incase we don't need to do the above one more thing that can be done is 
        {color:red}
        <Existing KV format><int – negative integer indicating the length of 
the tag><tag array>
        {color}
Here the negative length is used only when there is a tag and the existing KV 
format is left untouched when there is no tag.
In this approach we would be every time reading the next KVs keylength and then 
decide if there is a tag presence or not.  If not present we just rewind the 
position of the buffer.
This has a performance impact but does not involve changes to the HFileFormats.


So in both of the cases we tend to write the tag info whether or not user needs 
it.  So one way to avoid it could be like the way we do for MemstoreTS.
Add a meta data to the hfile saying tagpresent = true/false based on the KVs in 
that HFile.  
Even if there is only one KV with tag this meta data will be true.
Now on compaction we will read this metadata and decide whether to compact data 
with Tag or without tag.
The advantage is that for scenarios where there are no tags we will have not 
have a drop in read performance (this applies after compaction is done).
The downside of this approach is that the KeyValue format itself now becomes 2 
ways of representation.  Sometimes the KV that we retrieve will have tag info 
sometimes will not be having tag.
Thanks to Anoop and Andy for their suggestions/inputs.

I have some patches ready for the above approaches except for that option tag 
part.  Wanted to know if that can be provided as a feature in the future?  
anyway will try out the optional part also to see what type of changes/issues 
we may face while implementing it.
Comments/feedback welcome.  Anyother ideas am open to hear them also.  
                
> Implement tags and the internals of how a tag should look like
> --------------------------------------------------------------
>
>                 Key: HBASE-8496
>                 URL: https://issues.apache.org/jira/browse/HBASE-8496
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.98.0
>
>
> The intent of this JIRA comes from HBASE-7897.
> This would help us to decide on the structure and format of how the tags 
> should look like. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to