Maybe, you can get more feedbacks and help on d...@nifi.apache.org for that code pach. ;)
On Thu, Jul 16, 2020 at 1:27 PM Ryan Schachte <coderyanschac...@gmail.com> wrote: > Hey Matt, > I'm exploring the NiFi code and I feel I'm really close. This code will > probably work great for me, but I'm getting a failure because it's seeing a > TypeInfo struct once I delegate the code to: > row[i] = OrcUtils.convertToORCObject(OrcUtils.getOrcField(fieldSchema), o); > > *Java Code:* > > > Schema avroSchema = record.getSchema(); > > TypeInfo orcInfo = OrcUtils.getOrcField(avroSchema); > > TypeDescription orcSchema = > > TypeDescription.fromString(orcInfo.getTypeName()); > > Writer orcWriter = OrcWriter.createWriter(orcSchema, > > driver.generateTmpPath()); > > > *Sample Data:* > struct<businessDayDate:string,anotherField:string,anotherFieldAgain:string> > > On Wed, Jul 15, 2020 at 11:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > > > It's good for you, Ryan, because there are many alternatives. > > > > FYI, Apache Spark 3.0.0 is using Apache Hive 2.3.7. > > And, everything is running in a local mode on the single container. > > > > > Would the entire thing run within the same container and then I > leverage > > the Spark APIs from that in local mode? > > > > More simply, you can generate a minimal scala script on the runtime like > > the following and run via Spark shell in that container. > > > > $ cat hello.scala > > print("a") > > $ bin/spark-shell -I hello.scala > > > > Bests, > > Dongjoon. > > > > > > On Wed, Jul 15, 2020 at 7:34 PM Matt Burgess <mattyb...@apache.org> > wrote: > > > > > Ryan, > > > > > > It's possible there are some changes that would cause that code not to > > > compile for Hive 2, but I have done some work with porting similar > > > processors to Hive 2 and as I recall it was mostly API-type breaking > > > changes and not so much from the behavior side of things, more of a > > > Maven and Java-package-name kind of thing. > > > > > > Regards, > > > Matt > > > > > > On Wed, Jul 15, 2020 at 8:39 PM Ryan Schachte > > > <coderyanschac...@gmail.com> wrote: > > > > > > > > Great, thanks Matt! Looking at this code now and feel this will > really > > > help > > > > me a lot. Anything you think would break using this logic for Hive > > 2.3.5? > > > > > > > > On Wed, Jul 15, 2020 at 5:04 PM Matt Burgess <mattyb...@apache.org> > > > wrote: > > > > > > > > > Ryan, > > > > > > > > > > In Apache NiFi we have a ConvertAvroToOrc processor [1], you may > find > > > > > code there that you can use in your Java program (take a look at > line > > > > > 212 and down). We had to create our own OrcFileWriter because the > one > > > > > in Apache ORC writes to a FileSystem where we needed to write to > our > > > > > own FlowFile component. But all the relevant code should be there > > (you > > > > > can replace the createWriter() call with the normal ORC one); one > > > > > caveat is that it's for Apache Hive 1.2, you may need to make > changes > > > > > if you're using Hive 3 libraries for example. > > > > > > > > > > Regards, > > > > > Matt > > > > > > > > > > [1] > > > > > > > > > > > https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/ConvertAvroToORC.java > > > > > > > > > > On Wed, Jul 15, 2020 at 4:51 PM Ryan Schachte > > > > > <coderyanschac...@gmail.com> wrote: > > > > > > > > > > > > I'm writing a standalone Java process and interested in > converting > > > the > > > > > > consumed Avro messages to ORC. I've seen a plethora of examples > of > > > > > writing > > > > > > to ORC, but the conversion to ORC from Avro is what I can't seem > to > > > find > > > > > a > > > > > > lot of examples of. > > > > > > > > > > > > This is just a standard Java process running inside of a > container. > > > > > > > > > > >