Ok looks like I'll need to go with the row-by-row API. Just to make sure I understand correctly, is that the approach Apache Crunch is using? With ObjectInspectors, Writables / POJOs. etc?
https://github.com/apache/crunch/blob/master/crunch-hive/src/main/java/org/apache/crunch/types/orc/OrcUtils.java If not, what is considered the row-by-row API (not using VectorizedRowBatch or ColumnVectors)? Thanks again, Matt On Fri, Jul 22, 2016 at 5:14 PM, Owen O'Malley <[email protected]> wrote: > Hi Matt, > > On Fri, Jul 22, 2016 at 1:21 PM, Matt Burgess <[email protected]> wrote: > >> All, >> >> Is this the right place to ask questions about hive-orc? I know it was >> split out into Apache ORC, and up until recently I have been using >> Apache ORC 1.1.2 to convert Avro files to ORC files, but I was told I >> need a version that works with only Hive 1.2.1. >> > > This works great, although most of the ORC developers read both. > > >> - Are complex types (list, map, struct, union, etc.) supported in >> hive-orc 1.2.1? I don't see the ListColumnVector and such types. > > > > Before HIVE-12159, which went into Hive 2.1, the only way to read complex > types was to use the row by row API. > > >> I >> can't bring in that storage-api-2.1.1-pre-orc JAR because of a >> conflict with BloomFilter, etc. >> > > How bad is the breakage? Can we fix it with a patch to ORC? > > >> >> - I was using VectorizedRowBatch to write my values in ORC 1.1.2, is >> that the correct/recommended approach in 1.2.1? I see Apache Crunch >> uses lots of MapReduce types but I would really like to limit the MR >> dependencies if possible since my app will not always be on a Hadoop >> node. >> > > Yes, the ORC MapReduce shim uses the VectorizedRowBatch and converts them > into WritableComparables so it will be fastest if you use > VectorizedRowBatch directly. Although as you have discovered that won't > work if you are trying to use hive-orc 1.2 > > >> - Are there any examples of converting Avro to ORC outside of Hive >> (but using Avro and hive-orc)? I see a couple of examples of >> reading/writing ORC files but nothing with Avro. No worries if not, I >> am writing one as part of this effort :) >> > > If you look at the benchmarking code in > https://github.com/apache/orc/pull/43 , you'll see that I took a first stab > at making an Avro writer that goes from ORC's TypeDescription and a > VectorizedRowBatch. > > .. Owen > > >> >> Thank you in advance, >> Matt >>
