[ 
https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085138#comment-14085138
 ] 

Josh Wills commented on CRUNCH-450:
-----------------------------------

So read over this a bit more, and I don't think that supporting the Orc files 
requires adding the OrcTypeFamily. As I read it, the Orcs serialization is 
primarily relying on the type class of the PType instance, and delegating the 
actual deserialization logic to the ObjectInspector (which is the right thing 
to do, I believe.) But then it seems to me that it would be possible to take in 
_any_ PType instance (Avro or Writable), extract its type class and the type 
classes of its sub-types, and then construct code that could read or write that 
data to an orcfile. At the lowest level, OrcTypes are WritableTypes with a 
custom serialization/deserialization protocol.

If that's not clear, I can whip up a version of the patch w/my preferred impl 
tomorrow.



> Adding ORC file format support in Crunch
> ----------------------------------------
>
>                 Key: CRUNCH-450
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-450
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core, IO
>            Reporter: Zhong Wang
>            Assignee: Josh Wills
>             Fix For: 0.11.0
>
>         Attachments: CRUNCH-450-submodule.1.patch, 
> CRUNCH-450-submodule.2.patch, CRUNCH-450-submodule.patch, CRUNCH-450.patch
>
>
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize 
> objects into OrcStruct
> 3. Supporting column pruning optimization



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to