I'm leaning a bit towards 1) but I would love to get some input from the Avro community as 1) depends also on their side as we will submit some patches upstream that need to be reviewed and someday also released.
Are AVRO committers subscribed here or should we reach out to them on their ML? Given that we are quite active in the C++ space currently, I feel that we can contribute quite some infrastructure in building and packaging that we do eitherway for Arrow. This might be quite helpful for a project. We have seen with Parquet where much of the development is just happening as it is part of Arrow. (Not suggesting to merge/fork the Avro codebase but just to apply some of the best practices we learned while building Arrow). Uwe On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote: > I'd be +0.5 in favor of forking in this particular case. Since Avro is > not vectorized (unlike Parquet and ORC) I suspect it may be more > difficult to get the best performance using a general purpose API > versus one that is more specialized to producing Arrow record batches. > Given that has been relatively light C++ development activity in > Apache Avro and no releases for 2 years it does give me pause. > > We might want to look at Impala's Avro scanner, they are doing some > LLVM IR cross-compilation also (they're using the Avro C++ library > though) > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It seems that the Avro > > C++ library APIs have improved from the last release. However, it is not > > clear when a new release will be available (I asked on the JIRA Item for > > the next release [2] and received no response). > > > > I was wondering if there is a policy governing using other Apache projects > > or how people felt about the following options: > > 1. Depend on a specific git commit through the third-party library system. > > 2. Copy the necessary source code temporarily to our project, and change > > to using the next release when it is available. > > 3. Fork the code we need (the main benefit I see here is being able to > > refactor it to avoid having to deal with exceptions, easier integration > > with our IO system and one less 3rd party dependency to deal with). > > 4. Wait on the 1.9 release before proceeding. > > > > Thanks, > > Micah > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209 > > [2] https://issues.apache.org/jira/browse/AVRO-2250 >