Great response, thanks for the info. Question on the Spark approach, as I have been thinking about this, but don't have a lot of knowledge in the area.
Let's say I develop my application, let's call it AvroToORC.java which has a dependency on Spark. Would the entire thing run within the same container and then I leverage the Spark APIs from that in local mode? On Wed, Jul 15, 2020 at 3:39 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi Ryan. > > If you need to build from the scratch, you may want to see a standalone > converter example in Apache ORC repository. > > - > > https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/ConvertTool.java > > Although it doesn't support Avro, there are CsvReader and JsonReader > in the same directory. So, you may implement AvroReader similarly. > > - > > https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/CsvReader.java > - > > https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/JsonReader.java > > However, you can use the existing software or converter tools. > For example, You can simply dockerize Apache Spark 3.0.0 on JDK11 > docker image and use it. The full JDK11 (openjdk:11) is 627MB. > If you use 11-jre-slim(`204MB`) as a base image, > the final docker image (Apache Spark 3.0.0 + JDK11) will be 500MB. > > Bests, > Dongjoon. > > > On Wed, Jul 15, 2020 at 1:51 PM Ryan Schachte <coderyanschac...@gmail.com> > wrote: > > > I'm writing a standalone Java process and interested in converting the > > consumed Avro messages to ORC. I've seen a plethora of examples of > writing > > to ORC, but the conversion to ORC from Avro is what I can't seem to find > a > > lot of examples of. > > > > This is just a standard Java process running inside of a container. > > >