[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021858#comment-13021858
 ] 

Doug Cutting commented on AVRO-806:
-----------------------------------

The writer would keep a buffer per field.  As records are added, each field 
would be encoded to its respective buffer.  When the total amount of buffered 
data reaches the desired block size, all column buffers would be flushed, 
preceded by an index listing the (compressed) sizes of each of column buffers.

Each buffer would be compressed prior to writing, probably with Snappy.

The reader would have a decoder for each field.  To skip fields not of 
interest, the application would specify a subset of the schema written.  The 
reader would only decompress and process fields present in the schema provided 
by the application.

> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to