[
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022217#comment-13022217
]
Scott Carey commented on AVRO-806:
----------------------------------
A useful reference is the Data model and Nested columar storage section in
Google's Dremel paper: http://www.google.com/research/pubs/pub36632.html
With Avro, one challenge of columnar storage is going to be Unions.
If a record is
{
int,
[float, string],
long
}
How do you deal with the unions? Do you put the whole union in the same
columnar chunk? Do you split the union into chunks for each branch, and store
the union index in its own chunk?
How this plays with records nested within unions or unions inside of arrays can
be complicated too.
> add a column-major codec for data files
> ---------------------------------------
>
> Key: AVRO-806
> URL: https://issues.apache.org/jira/browse/AVRO-806
> Project: Avro
> Issue Type: New Feature
> Components: java, spec
> Reporter: Doug Cutting
>
> Define a codec that, when a data file's schema is a record schema, writes
> blocks within the file in column-major order. This would permit better
> compression and also permit efficient skipping of fields that are not of
> interest.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira