[
https://issues.apache.org/jira/browse/AVRO-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974700#action_12974700
]
Adam Warrington commented on AVRO-712:
--------------------------------------
Scott Carey:
I think some usefulness can come from the ability to use Avro entities with
systems that use memcmp to sort binary data. For example, keys in an HBase
table. One could create multi-component keys for an HBase table using Avro, and
have guarantees about how their data is to be sorted. Say I'm storing blog
posts in HBase and want to group blog posts over time by domain within a table.
I could create an avro schema:
{ "type": "record",
"name": "author_blog_key",
"fields": [
{ "name": "domain", "type": "string" },
{ "name": "timestamp": "type": "long" }
]}
If the memcmp sort ordering could be guaranteed, I can used serialized
instances of this within systems that deal with sorted data using memcmp.
It's clear that the time/space costs are going to be negatively impacted with
this type of encoding, especially dealing with bytes/strings/arrays. Your
proposed encoding of Ints and Longs is clever, and I like the idea of putting
ignored fields at the end of a record if equivalency isn't required (which in
many cases it isn't).
> define memcmp'able encoding
> ---------------------------
>
> Key: AVRO-712
> URL: https://issues.apache.org/jira/browse/AVRO-712
> Project: Avro
> Issue Type: New Feature
> Components: spec
> Reporter: Doug Cutting
> Attachments: memcmp_encoding_prototype.py
>
>
> It would be useful to have an encoding for Avro data that ordered data
> according to the Avro specification under memcmp.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.