[jira] [Commented] (AVRO-806) add a column-major codec for data files

Scott Carey (JIRA) Tue, 21 Jun 2011 11:04:12 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052721#comment-13052721
 ]


Scott Carey commented on AVRO-806:
----------------------------------

{quote}
For now, unions have a single column, irrespective of the number and type of 
branches.
{quote}
I think "For now" = Forever?  This is a binary format, I think we should make 
unions columnar as well.  Backwards compatibility will be hard and high 
maintenance.

{quote}
How does this integrate with compression? I suspect we should compress each 
column separately, so the compression codec needs to be invoked on each buffer 
before it's written. This means that the Encoder must know about the 
compression codec.
{quote}

This depends on what the goal is.  If we want to avoid decompressing columns 
that are not accessed, we will need to do that.  Otherwise it is not necessary 
and compression ratios will be best if large blocks are compressed as a unit 
with all columns.


{quote}
and Snappy's fast enough that using it all of the time doesn't cost much.
{quote}
Yes, the CRC currently used costs more than Snappy.  


> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf
>
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-806) add a column-major codec for data files

Reply via email to