[
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448281#comment-13448281
]
Jakob Homan commented on AVRO-806:
----------------------------------
Yes, that's all reasonable. My concern is just enforcing a 1:1:1 relationship
between row groups, blocks and files. RCFile's very tiny recommended row group
size (4mb, I believe), certainly don't make sense from an IO perspective. But
if our only ability to increase parallelism on trevni files is to decrease the
size of row groups (and correspondingly increase the number of files), this may
be a problem. It's not required to enforce a 1:1:1 relationship in the file;
one could still have row groups large enough to make it worth the IO (and still
split on block boundaries), but have multiple of them within a single trevni
file. This could certainly be supported as an option.
Either way, this is looking good.
> add a column-major codec for data files
> ---------------------------------------
>
> Key: AVRO-806
> URL: https://issues.apache.org/jira/browse/AVRO-806
> Project: Avro
> Issue Type: New Feature
> Components: java, spec
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Attachments: AVRO-806.patch, AVRO-806-v2.patch, avro-file-columnar.pdf
>
>
> Define a codec that, when a data file's schema is a record schema, writes
> blocks within the file in column-major order. This would permit better
> compression and also permit efficient skipping of fields that are not of
> interest.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira