[jira] [Updated] (AVRO-806) add a column-major codec for data files

Doug Cutting (JIRA) Fri, 22 Apr 2011 15:59:46 -0700

     [ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Doug Cutting updated AVRO-806:
------------------------------

    Attachment: AVRO-806.patch

This is a work in progress.

I believe the output is correct and complete, but the input side is not right 
yet so I can't test it.  To read the column format I need ResolvingDecoder to 
call three new Decoder methods on the nested Decoder:
 - startRecord() at the beginning of each record
 - startField() at the beginning of each field
 - endRecord() at the end of each field.

I've made some changes to ResolvingDecoder attempting to do this, but they 
don't work and I don't understand it well enough to make this work.  Thiru, can 
you please help me here?

That would get the input side to work, but it wouldn't yet be much faster when 
columns are elided from the schema.  To make it fast we also need to change 
ResolvingDecoder to take advantage of the new Decoder#skipField() method.

> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-806.patch
>
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-806) add a column-major codec for data files

Reply via email to