[ https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076738#comment-14076738 ]
Wang Zhong commented on CRUNCH-450: ----------------------------------- Thanks for your review, Josh! For your questions: -- 1) I implemented OrcTypeFamily because the low-level file layout of ORC is distinguishable enough to have its own type family. OrcStruct is also a very special Writable implementation, which doesn't actually support write()/readFields(). In order to distinguish (and not to mix) orc with other writable formats, I created a standalone type family for ORC. 2) I think it is a good idea to have a crunch-hive submodule for now. The Hive team is also working on refactoring the Hive dependencies to make it more concise and modular (HIVE-7423). I hope we can then move this orc support into Crunch trunk after we have a modularized dependency for this component. > Adding ORC file format support in Crunch > ---------------------------------------- > > Key: CRUNCH-450 > URL: https://issues.apache.org/jira/browse/CRUNCH-450 > Project: Crunch > Issue Type: New Feature > Components: Core, IO > Reporter: Wang Zhong > Assignee: Josh Wills > Attachments: CRUNCH-450.patch > > > This JIRA adds ORC file format support in Crunch by: > -- > 1. Adding input source and output target for ORC > 2. Adding a new type family - OrcTypeFamily to serialize / deserialize > objects into OrcStruct > 3. Supporting column pruning optimization -- This message was sent by Atlassian JIRA (v6.2#6252)