[
https://issues.apache.org/jira/browse/NIFI-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934836#comment-15934836
]
Ryan Persaud commented on NIFI-3625:
------------------------------------
I split the existing monolithic onTrigger() function in PutHiveStreaming into
multiple functions so that the code could be reused with other content types
like JSON. Incoming JSON content may either be a single JSON element, or an
array of elements. The incoming content type can either be explicitly
specified as JSON or Avro, or the mime.type attribute can be used. I cribbed
that code from the InferAvroSchema processor.
In order to support multiple content types, I made HiveStreamingRecord generic,
and I added an interface, IHiveStreamingRecordWriter<T>, that has a single
function writeRecords() which appends records to a flowfile. This function
replaces the Avro-specific appendRecordsToFlowFile().
There is a section of code (453-466) in the existing onTrigger() that created a
JSON object from the Avro fields, but then the JSON object is never used. This
code seems superfluous since we are calling the .toString() function on the
Avro object to generate the JSON to pass to the StrictJsonWriter, so I removed
it. It's distinctly possible that I missed the point of that code, so please
let me know if it serves a purpose.
I duplicated the existing test cases and modified them to work with JSON. I
also added JSON-specific test cases that verify that both single JSON elements
and JSON arrays work as expected. Finally, I added test cases to verify that
the mime.type detection code functions as expected. All tests and checkstyle
were passed.
I tested with a HDP 2.5 sandbox by building against the hadoop libraries
suggested by Matt Burgess in NIFI-2448:
mvn clean install -Phortonworks -Dhive.version=1.2.1000.2.5.0.0-1245
-Dhadoop.version=2.7.3.2.5.0.0-1245
I created a single PutHiveStreaming instance that was set to use mime.type, and
I verfied that it could load both JSON and binary Avro content into Hive.
Then, I verified that explicitly setting the input content type worked as well.
Finally, I verified that if mime.type is in use, but a record has an
unsupported or missing mime.type attribute, the record is transferred to the
unsupported content relationship.
Using the DelimitedInputWriter, it should be fairly straightforward to add CSV
support to the PutHiveStreaming processor, and I hope to be able to tackle in a
subsequent PR.
> Add JSON support to PutHiveStreaming
> ------------------------------------
>
> Key: NIFI-3625
> URL: https://issues.apache.org/jira/browse/NIFI-3625
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.2.0
> Reporter: Ryan Persaud
> Fix For: 1.2.0
>
>
> As noted in a Hortonworks Community Connection post
> (https://community.hortonworks.com/questions/88424/nifi-puthivestreaming-requires-avro.html),
> PutHiveStreaming does not currently support JSON Flow File content. I've
> completed the code to allow JSON flow files to be streamed into hive, and I'm
> currently working on test cases and updated documentation. I should have a
> PR to submit this week.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)