[ https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076640#comment-14076640 ]
Josh Wills commented on CRUNCH-450: ----------------------------------- Wow- that is a phenomenal amount of work- thanks for sending it along! A couple of high-level questions: 1) What does OrcTypeFamily buy me? We've flirted with expanding the set of TypeFamilies from Avro and Writable in the past, but have always been cautious about actually doing it b/c the two-typefamily assumption is baked into so many things in the system. If everything in Orc is compiled down to a type of Writable, would it still work as a collection of derived PTypes on top of the WritableTypeFamily? 2) We also try to avoid large and complex external dependencies in crunch-core-- could we move this into a new submodule, crunch-hive, which would contain all of our Hive dependency stuff? I think there's more of it that we want to include (e.g., CRUNCH-340) and a few other things I wouldn't mind having down the line, but I don't want to introduce the dependency complexity for pipelines that don't actually make use of Hive stuff. > Adding ORC file format support in Crunch > ---------------------------------------- > > Key: CRUNCH-450 > URL: https://issues.apache.org/jira/browse/CRUNCH-450 > Project: Crunch > Issue Type: New Feature > Components: Core, IO > Reporter: Wang Zhong > Assignee: Josh Wills > Attachments: CRUNCH-450.patch > > > This JIRA adds ORC file format support in Crunch by: > -- > 1. Adding input source and output target for ORC > 2. Adding a new type family - OrcTypeFamily to serialize / deserialize > objects into OrcStruct > 3. Supporting column pruning optimization -- This message was sent by Atlassian JIRA (v6.2#6252)