[
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644209#comment-16644209
]
ASF GitHub Bot commented on NIFI-5667:
--------------------------------------
GitHub user mattyb149 opened a pull request:
https://github.com/apache/nifi/pull/3057
NIFI-5667: Add nested record support for PutORC
The basic approach here is that I removed all references to Avro
schemas/fields and replaced them with NiFi Record API concepts. This allows us
to not have to switch back and forth, since Avro is not the de facto standard
for schemas or flow file content (although we are still fairly tightly coupled
to Avro schemas, but that's a different issue :) )
I'll comment in the PR on various parts of the code to note the "real"
changes to fix the reported issue.
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
- [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number
you are trying to resolve? Pay particular attention to the hyphen "-" character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically master)?
- [x] Is your initial contribution a single, squashed commit?
### For code changes:
- [x] Have you ensured that the full suite of tests is executed via mvn
-Pcontrib-check clean install at the root nifi folder?
- [x] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to
.name (programmatic access) for each of the new properties?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in
which it is rendered?
### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mattyb149/nifi NIFI-5667
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/3057.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3057
----
commit ab16d6f77c7dead859643492fe650685ffd7e4ba
Author: Matthew Burgess <mattyb149@...>
Date: 2018-10-09T22:59:00Z
NIFI-5667: Add nested record support for PutORC
----
> Hive3 PutOrc processors, error when using nestled Avro Record types
> -------------------------------------------------------------------
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
> Reporter: Viking Karstorp
> Assignee: Matt Burgess
> Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the
> following error (see full stack further down)
> java.lang.IllegalArgumentException: Error converting object of type
> org.apache.nifi.serialization.record.MapRecord to ORC type The older
> ConvertAvroToOrc processor processed the flowfile without issues. Also to
> note is that the PutOrc processor handles the flowfile fine if there is no
> Avro data with only the schema present. It seems to be related to nestled
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
> "name": "nifi_hive3_test",
> "namespace": "analytics.models.test",
> "type": "record",
> "fields": [
> {
> "name": "Serial",
> "type":
> {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
> {
> "name": "Serial",
> "type": "long"
> }
> ]
> }
> }
> ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 11088000000001615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8]
> org.apache.nifi.processors.orc.PutORC
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to
> java.lang.IllegalArgumentException: Error converting object of type
> org.apache.nifi.serialization.record.MapRecord to ORC type
> struct<serial:bigint>: java.lang.IllegalArgumentException: Error converting
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type
> struct<serial:bigint>
> java.lang.IllegalArgumentException: Error converting object of type
> org.apache.nifi.serialization.record.MapRecord to ORC type
> struct<serial:bigint>
> at
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)