[ 
https://issues.apache.org/jira/browse/NIFI-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934836#comment-15934836
 ] 

Ryan Persaud commented on NIFI-3625:
------------------------------------

I split the existing monolithic onTrigger() function in PutHiveStreaming into 
multiple functions so that the code could be reused with other content types 
like JSON.  Incoming JSON content may either be a single JSON element, or an 
array of elements.  The incoming content type can either be explicitly 
specified as JSON or Avro, or the mime.type attribute can be used.  I cribbed 
that code from the InferAvroSchema processor.  

In order to support multiple content types, I made HiveStreamingRecord generic, 
and I added an interface, IHiveStreamingRecordWriter<T>, that has a single 
function writeRecords() which appends records to a flowfile.  This function 
replaces the Avro-specific appendRecordsToFlowFile().

There is a section of code (453-466) in the existing onTrigger() that created a 
JSON object from the Avro fields, but then the JSON object is never used.  This 
code seems superfluous since we are calling the .toString() function on the 
Avro object to generate the JSON to pass to the StrictJsonWriter, so I removed 
it.  It's distinctly possible that I missed the point of that code, so please 
let me know if it serves a purpose.

I duplicated the existing test cases and modified them to work with JSON.  I 
also added JSON-specific test cases that verify that both single JSON elements 
and JSON arrays work as expected.  Finally, I added test cases to verify that 
the mime.type detection code functions as expected.  All tests and checkstyle 
were passed.

I tested with a HDP 2.5 sandbox by building against the hadoop libraries 
suggested by Matt Burgess in NIFI-2448:
mvn clean install -Phortonworks -Dhive.version=1.2.1000.2.5.0.0-1245 
-Dhadoop.version=2.7.3.2.5.0.0-1245

I created a single PutHiveStreaming instance that was set to use mime.type, and 
I verfied that it could load both JSON and binary Avro content into Hive.  
Then, I verified that explicitly setting the input content type worked as well. 
 Finally, I verified that if mime.type is in use, but a record has an 
unsupported or missing mime.type attribute, the record is transferred to the 
unsupported content relationship.

Using the DelimitedInputWriter, it should be fairly straightforward to add CSV 
support to the PutHiveStreaming processor, and I hope to be able to tackle in a 
subsequent PR.

> Add JSON support to PutHiveStreaming
> ------------------------------------
>
>                 Key: NIFI-3625
>                 URL: https://issues.apache.org/jira/browse/NIFI-3625
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.2.0
>            Reporter: Ryan Persaud
>             Fix For: 1.2.0
>
>
> As noted in a Hortonworks Community Connection post 
> (https://community.hortonworks.com/questions/88424/nifi-puthivestreaming-requires-avro.html),
>  PutHiveStreaming does not currently support JSON Flow File content.  I've 
> completed the code to allow JSON flow files to be streamed into hive, and I'm 
> currently working on test cases and updated documentation.  I should have a 
> PR to submit this week.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to