[ 
https://issues.apache.org/jira/browse/AVRO-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974700#action_12974700
 ] 

Adam Warrington commented on AVRO-712:
--------------------------------------

Scott Carey:

I think some usefulness can come from the ability to use Avro entities with 
systems that use memcmp to sort binary data. For example, keys in an HBase 
table. One could create multi-component keys for an HBase table using Avro, and 
have guarantees about how their data is to be sorted. Say I'm storing blog 
posts in HBase and want to group blog posts over time by domain within a table. 
I could create an avro schema:

{ "type": "record",
  "name": "author_blog_key",
  "fields": [
    { "name": "domain", "type": "string" },
    { "name": "timestamp": "type": "long" }
  ]}

If the memcmp sort ordering could be guaranteed, I can used serialized 
instances of this within systems that deal with sorted data using memcmp.

It's clear that the time/space costs are going to be negatively impacted with 
this type of encoding, especially dealing with bytes/strings/arrays. Your 
proposed encoding of Ints and Longs is clever, and I like the idea of putting 
ignored fields at the end of a record if equivalency isn't required (which in 
many cases it isn't).


> define memcmp'able encoding
> ---------------------------
>
>                 Key: AVRO-712
>                 URL: https://issues.apache.org/jira/browse/AVRO-712
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>         Attachments: memcmp_encoding_prototype.py
>
>
> It would be useful to have an encoding for Avro data that ordered data 
> according to the Avro specification under memcmp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to