[
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058166#comment-13058166
]
Jeff Hammerbacher commented on AVRO-806:
----------------------------------------
> I think the advantage of a columnar format is to avoid touching data that's
> not needed, and avoiding decompression is consistent with that.
During the design of RCFile, the folks from Facebook found that packing a few
columns together into a file made for better performance than putting a single
column into the file. There's a trade-off between CPU consumed in
deserialization and IO consumed in pulling the data off of disk. Avoiding
decompressing columns that are not accessed seemed to be important for Hive
performance.
> add a column-major codec for data files
> ---------------------------------------
>
> Key: AVRO-806
> URL: https://issues.apache.org/jira/browse/AVRO-806
> Project: Avro
> Issue Type: New Feature
> Components: java, spec
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf
>
>
> Define a codec that, when a data file's schema is a record schema, writes
> blocks within the file in column-major order. This would permit better
> compression and also permit efficient skipping of fields that are not of
> interest.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira