Great response, thanks for the info. Question on the Spark approach, as I
have been thinking about this, but don't have a lot of knowledge in the
area.

Let's say I develop my application, let's call it AvroToORC.java which has
a dependency on Spark. Would the entire thing run within the same container
and then I leverage the Spark APIs from that in local mode?

On Wed, Jul 15, 2020 at 3:39 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi Ryan.
>
> If you need to build from the scratch, you may want to see a standalone
> converter example in Apache ORC repository.
>
>     -
>
> https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/ConvertTool.java
>
> Although it doesn't support Avro, there are CsvReader and JsonReader
> in the same directory. So, you may implement AvroReader similarly.
>
>     -
>
> https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/CsvReader.java
>     -
>
> https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/JsonReader.java
>
> However, you can use the existing software or converter tools.
> For example, You can simply dockerize Apache Spark 3.0.0 on JDK11
> docker image and use it. The full JDK11 (openjdk:11) is 627MB.
> If you use 11-jre-slim(`204MB`) as a base image,
> the final docker image (Apache Spark 3.0.0 + JDK11) will be 500MB.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jul 15, 2020 at 1:51 PM Ryan Schachte <coderyanschac...@gmail.com>
> wrote:
>
> > I'm writing a standalone Java process and interested in converting the
> > consumed Avro messages to ORC. I've seen a plethora of examples of
> writing
> > to ORC, but the conversion to ORC from Avro is what I can't seem to find
> a
> > lot of examples of.
> >
> > This is just a standard Java process running inside of a container.
> >
>

Reply via email to