[
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021858#comment-13021858
]
Doug Cutting commented on AVRO-806:
-----------------------------------
The writer would keep a buffer per field. As records are added, each field
would be encoded to its respective buffer. When the total amount of buffered
data reaches the desired block size, all column buffers would be flushed,
preceded by an index listing the (compressed) sizes of each of column buffers.
Each buffer would be compressed prior to writing, probably with Snappy.
The reader would have a decoder for each field. To skip fields not of
interest, the application would specify a subset of the schema written. The
reader would only decompress and process fields present in the schema provided
by the application.
> add a column-major codec for data files
> ---------------------------------------
>
> Key: AVRO-806
> URL: https://issues.apache.org/jira/browse/AVRO-806
> Project: Avro
> Issue Type: New Feature
> Components: java, spec
> Reporter: Doug Cutting
>
> Define a codec that, when a data file's schema is a record schema, writes
> blocks within the file in column-major order. This would permit better
> compression and also permit efficient skipping of fields that are not of
> interest.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira