[
https://issues.apache.org/jira/browse/HBASE-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587857#comment-13587857
]
Nick Dimiduk commented on HBASE-7221:
-------------------------------------
I believe this ticket is redundant to, not compatible with, HBASE-7692. They
tread similar ground but with different intention. Please correct me if I'm
wrong, but this ticket seeks to add a convenience for building byte[] values
from component pieces and make it easy to read the pieces back out again. It it
only mentions ordering of the serialized representation by way of
[~lhofhansl]'s
[comment|https://issues.apache.org/jira/browse/HBASE-7221?focusedCommentId=13584577&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13584577];
it otherwise does not address the issue. HBASE-7692 directly targets the
ordered serialization problem. It also provides an implementation for building
byte[]s from component pieces, which is built on top of the ordered
serialization implementation. The result is a composite byte[] that maintains
sort order with respect to the components.
By my read, RowKeySchema in this ticket looks roughly equivalent to
StructRowKey added in 7692. Both encapsulate an ordered sequence of
serializable *types*. FixedLengthRowKey in this ticket handles reading and
writing *values* to a byte[] according to the RowKeySchema. In 7692, these
operations are encapsulated in the serialize, deserialize methods on
StructRowKey, which in turn delegate to the component \*RowKey implementations.
This ticket's RowKeyElement class appears to use a fixed-length data encoding
implicitly because it does not rely on the value under question to produce an
encoded length. In 7692, this concern is delegated to the \*RowKey
implementations. This ticket provides explicit support for hashing byte[]
values using either of two provided algorithms. 7692 does no hashing for the
user as it stands. This feature could be added if desired.
The major difference is in this ticket's RowKeyDataConverter. It does the work
serializing and deserializing values. It does so using the existing util.Bytes.
This is where HBASE-7692 aims to provide an entirely different feature: a
serialization format that preserves order consistent with the natural
representation. It does so via the rest of the \*RowKey implementations.
Yes, a RowKeyDataConverter could be implemented that made use of the
serializers in HBASE-7692. It would make decisions for the user regarding how
to represent the data, including whether to use a fixed- or variable-width
encoding format (features not provided by this ticket).
There is one other key feature that is omitted from both implementations under
discussion. Neither implementation goes the extra mile of serializing schema
details into the representations they produce (see also: Avro). I think this is
an extremely useful (necessary) feature for a long-term serialization format.
Without this, any change to decisions we make here will require rewriting data
stored in a previous format. I've not investigated how/if this can be done
while maintaining the order-preserving nature of the serialization strategy. It
may be that the two features are mutually exclusive by some necessity of one or
the other.
> RowKey utility class for rowkey construction
> --------------------------------------------
>
> Key: HBASE-7221
> URL: https://issues.apache.org/jira/browse/HBASE-7221
> Project: HBase
> Issue Type: Improvement
> Components: util
> Reporter: Doug Meil
> Assignee: Doug Meil
> Priority: Minor
> Attachments: HBASE_7221.patch, hbase-common_hbase_7221_2.patch,
> hbase-common_hbase_7221_v3.patch, hbase-common_hbase_7221_v4.patch,
> hbase-server_hbase_7221_v5.patch, hbase-server_hbase_7221_v6.patch
>
>
> A common question in the dist-lists is how to construct rowkeys, particularly
> composite keys. Put/Get/Scan specifies byte[] as the rowkey, but it's up to
> you to sensibly populate that byte-array, and that's where things tend to go
> off the rails.
> The intent of this RowKey utility class isn't meant to add functionality into
> Put/Get/Scan, but rather make it simpler for folks to construct said arrays.
> Example:
> {code}
> RowKey key = RowKey.create(RowKey.SIZEOF_MD5_HASH + RowKey.SIZEOF_LONG);
> key.addHash(a);
> key.add(b);
> byte bytes[] = key.getBytes();
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira