[ https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085138#comment-14085138 ]
Josh Wills commented on CRUNCH-450: ----------------------------------- So read over this a bit more, and I don't think that supporting the Orc files requires adding the OrcTypeFamily. As I read it, the Orcs serialization is primarily relying on the type class of the PType instance, and delegating the actual deserialization logic to the ObjectInspector (which is the right thing to do, I believe.) But then it seems to me that it would be possible to take in _any_ PType instance (Avro or Writable), extract its type class and the type classes of its sub-types, and then construct code that could read or write that data to an orcfile. At the lowest level, OrcTypes are WritableTypes with a custom serialization/deserialization protocol. If that's not clear, I can whip up a version of the patch w/my preferred impl tomorrow. > Adding ORC file format support in Crunch > ---------------------------------------- > > Key: CRUNCH-450 > URL: https://issues.apache.org/jira/browse/CRUNCH-450 > Project: Crunch > Issue Type: New Feature > Components: Core, IO > Reporter: Zhong Wang > Assignee: Josh Wills > Fix For: 0.11.0 > > Attachments: CRUNCH-450-submodule.1.patch, > CRUNCH-450-submodule.2.patch, CRUNCH-450-submodule.patch, CRUNCH-450.patch > > > This JIRA adds ORC file format support in Crunch by: > -- > 1. Adding input source and output target for ORC > 2. Adding a new type family - OrcTypeFamily to serialize / deserialize > objects into OrcStruct > 3. Supporting column pruning optimization -- This message was sent by Atlassian JIRA (v6.2#6252)