Re: Avro to ORC Conversion

Dongjoon Hyun Thu, 16 Jul 2020 13:46:25 -0700

Maybe, you can get more feedbacks and help on d...@nifi.apache.org for that
code pach. ;)



On Thu, Jul 16, 2020 at 1:27 PM Ryan Schachte <coderyanschac...@gmail.com>
wrote:

> Hey Matt,
> I'm exploring the NiFi code and I feel I'm really close. This code will
> probably work great for me, but I'm getting a failure because it's seeing a
> TypeInfo struct once I delegate the code to:
> row[i] = OrcUtils.convertToORCObject(OrcUtils.getOrcField(fieldSchema), o);
>
> *Java Code:*
>
> > Schema avroSchema = record.getSchema();
> > TypeInfo orcInfo = OrcUtils.getOrcField(avroSchema);
> > TypeDescription orcSchema =
> > TypeDescription.fromString(orcInfo.getTypeName());
> > Writer orcWriter = OrcWriter.createWriter(orcSchema,
> > driver.generateTmpPath());
>
>
> *Sample Data:*
> struct<businessDayDate:string,anotherField:string,anotherFieldAgain:string>
>
> On Wed, Jul 15, 2020 at 11:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
> > It's good for you, Ryan, because there are many alternatives.
> >
> > FYI, Apache Spark 3.0.0 is using Apache Hive 2.3.7.
> > And, everything is running in a local mode on the single container.
> >
> > > Would the entire thing run within the same container and then I
> leverage
> > the Spark APIs from that in local mode?
> >
> > More simply, you can generate a minimal scala script on the runtime like
> > the following and run via Spark shell in that container.
> >
> >     $ cat hello.scala
> >     print("a")
> >     $ bin/spark-shell -I hello.scala
> >
> > Bests,
> > Dongjoon.
> >
> >
> > On Wed, Jul 15, 2020 at 7:34 PM Matt Burgess <mattyb...@apache.org>
> wrote:
> >
> > > Ryan,
> > >
> > > It's possible there are some changes that would cause that code not to
> > > compile for Hive 2, but I have done some work with porting similar
> > > processors to Hive 2 and as I recall it was mostly API-type breaking
> > > changes and not so much from the behavior side of things, more of a
> > > Maven and Java-package-name kind of thing.
> > >
> > > Regards,
> > > Matt
> > >
> > > On Wed, Jul 15, 2020 at 8:39 PM Ryan Schachte
> > > <coderyanschac...@gmail.com> wrote:
> > > >
> > > > Great, thanks Matt! Looking at this code now and feel this will
> really
> > > help
> > > > me a lot. Anything you think would break using this logic for Hive
> > 2.3.5?
> > > >
> > > > On Wed, Jul 15, 2020 at 5:04 PM Matt Burgess <mattyb...@apache.org>
> > > wrote:
> > > >
> > > > > Ryan,
> > > > >
> > > > > In Apache NiFi we have a ConvertAvroToOrc processor [1], you may
> find
> > > > > code there that you can use in your Java program (take a look at
> line
> > > > > 212 and down). We had to create our own OrcFileWriter because the
> one
> > > > > in Apache ORC writes to a FileSystem where we needed to write to
> our
> > > > > own FlowFile component. But all the relevant code should be there
> > (you
> > > > > can replace the createWriter() call with the normal ORC one); one
> > > > > caveat is that it's for Apache Hive 1.2, you may need to make
> changes
> > > > > if you're using Hive 3 libraries for example.
> > > > >
> > > > > Regards,
> > > > > Matt
> > > > >
> > > > > [1]
> > > > >
> > >
> >
> https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/ConvertAvroToORC.java
> > > > >
> > > > > On Wed, Jul 15, 2020 at 4:51 PM Ryan Schachte
> > > > > <coderyanschac...@gmail.com> wrote:
> > > > > >
> > > > > > I'm writing a standalone Java process and interested in
> converting
> > > the
> > > > > > consumed Avro messages to ORC. I've seen a plethora of examples
> of
> > > > > writing
> > > > > > to ORC, but the conversion to ORC from Avro is what I can't seem
> to
> > > find
> > > > > a
> > > > > > lot of examples of.
> > > > > >
> > > > > > This is just a standard Java process running inside of a
> container.
> > > > >
> > >
> >
>

Re: Avro to ORC Conversion

Reply via email to