[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022395#comment-13022395
 ] 

Doug Cutting commented on AVRO-806:
-----------------------------------

The question is not whether the elements of depth > 1 are included, but whether 
they're each stored in a distinct column.  Regardless, one will read the data 
file in the same way, using a schema with a subset of the fields, even if 
you're not using the column-major codec at all.  So if you have a query that 
scans only field x.y.z, then storing values for x.y in a column will still make 
things faster than a row-order, but perhaps not as fast as if x.y.z values were 
stored in their own column, especially if y has a lot of other fields.  Note 
that Avro's already fast at skipping string and binary values that are not 
desired: it reads the length and increments the buffer pointer.  So 
column-major will provide the biggest speedup for structures that have a lot of 
numeric fields that are often ignored queries. 

> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to