[
https://issues.apache.org/jira/browse/AVRO-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919875#action_12919875
]
Doug Cutting commented on AVRO-679:
-----------------------------------
Adding a new fundamental type or encoding is hard to do compatibly. Rather I
wonder whether this could be layered, as a library? One might automatically
rewrite schemas and have a layer that transforms datastructures accordingly?
This could perhaps be done without copying data, as wrapping DatumWriter and
DatumReader implementations.
Also related is columnar compression in a data file. In this case, a data file
is a sequence of records whose schema might be re-written. For example, a file
containing <string,long> pairs might be represented as a data file containing
<int,string,long> records where the int contains the number of characters
shared with the previous string and the long the difference from the previous
long. Schema properties could indicate which fields should be represented as
differences. If random-access is required, e.g., for mapreduce splitting, then
the container (DataFileReader & DataFileWriter in Java) might have per-block
callbacks.
> Improved encodings for arrays
> -----------------------------
>
> Key: AVRO-679
> URL: https://issues.apache.org/jira/browse/AVRO-679
> Project: Avro
> Issue Type: New Feature
> Components: spec
> Reporter: Stu Hood
> Priority: Minor
>
> There are better ways to encode arrays of varints [1] which are faster to
> decode, and more space efficient than encoding varints independently.
> Extending the idea to other types of variable length data like 'bytes' and
> 'string', you could encode the entries for an array block as an array of
> lengths, followed by contiguous byte/utf8 data.
> [1] group varint encoding: slides 57-63 of
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/WSDM09-keynote.pdf
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.