[jira] [Commented] (HBASE-7221) RowKey utility class for rowkey construction

Nick Dimiduk (JIRA) Tue, 26 Feb 2013 17:41:14 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587857#comment-13587857
 ]


Nick Dimiduk commented on HBASE-7221:
-------------------------------------

I believe this ticket is redundant to, not compatible with, HBASE-7692. They 
tread similar ground but with different intention. Please correct me if I'm 
wrong, but this ticket seeks to add a convenience for building byte[] values 
from component pieces and make it easy to read the pieces back out again. It it 
only mentions ordering of the serialized representation by way of 
[~lhofhansl]'s 
[comment|https://issues.apache.org/jira/browse/HBASE-7221?focusedCommentId=13584577&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13584577];
 it otherwise does not address the issue. HBASE-7692 directly targets the 
ordered serialization problem. It also provides an implementation for building 
byte[]s from component pieces, which is built on top of the ordered 
serialization implementation. The result is a composite byte[] that maintains 
sort order with respect to the components.

By my read, RowKeySchema in this ticket looks roughly equivalent to 
StructRowKey added in 7692. Both encapsulate an ordered sequence of 
serializable *types*. FixedLengthRowKey in this ticket handles reading and 
writing *values* to a byte[] according to the RowKeySchema. In 7692, these 
operations are encapsulated in the serialize, deserialize methods on 
StructRowKey, which in turn delegate to the component \*RowKey implementations. 
This ticket's RowKeyElement class appears to use a fixed-length data encoding 
implicitly because it does not rely on the value under question to produce an 
encoded length. In 7692, this concern is delegated to the \*RowKey 
implementations. This ticket provides explicit support for hashing byte[] 
values using either of two provided algorithms. 7692 does no hashing for the 
user as it stands. This feature could be added if desired.

The major difference is in this ticket's RowKeyDataConverter. It does the work 
serializing and deserializing values. It does so using the existing util.Bytes. 
This is where HBASE-7692 aims to provide an entirely different feature: a 
serialization format that preserves order consistent with the natural 
representation. It does so via the rest of the \*RowKey implementations.

Yes, a RowKeyDataConverter could be implemented that made use of the 
serializers in HBASE-7692. It would make decisions for the user regarding how 
to represent the data, including whether to use a fixed- or variable-width 
encoding format (features not provided by this ticket).

There is one other key feature that is omitted from both implementations under 
discussion. Neither implementation goes the extra mile of serializing schema 
details into the representations they produce (see also: Avro). I think this is 
an extremely useful (necessary) feature for a long-term serialization format. 
Without this, any change to decisions we make here will require rewriting data 
stored in a previous format. I've not investigated how/if this can be done 
while maintaining the order-preserving nature of the serialization strategy. It 
may be that the two features are mutually exclusive by some necessity of one or 
the other.
                
> RowKey utility class for rowkey construction
> --------------------------------------------
>
>                 Key: HBASE-7221
>                 URL: https://issues.apache.org/jira/browse/HBASE-7221
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE_7221.patch, hbase-common_hbase_7221_2.patch, 
> hbase-common_hbase_7221_v3.patch, hbase-common_hbase_7221_v4.patch, 
> hbase-server_hbase_7221_v5.patch, hbase-server_hbase_7221_v6.patch
>
>
> A common question in the dist-lists is how to construct rowkeys, particularly 
> composite keys.  Put/Get/Scan specifies byte[] as the rowkey, but it's up to 
> you to sensibly populate that byte-array, and that's where things tend to go 
> off the rails.
> The intent of this RowKey utility class isn't meant to add functionality into 
> Put/Get/Scan, but rather make it simpler for folks to construct said arrays.  
> Example:
> {code}
>    RowKey key = RowKey.create(RowKey.SIZEOF_MD5_HASH + RowKey.SIZEOF_LONG);
>    key.addHash(a);
>    key.add(b);
>    byte bytes[] = key.getBytes();
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7221) RowKey utility class for rowkey construction

Reply via email to