[ 
https://issues.apache.org/jira/browse/NIFI-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438062#comment-16438062
 ] 

Matt Burgess commented on NIFI-4456:
------------------------------------

For the readers, we can relax the constraint that the incoming flow file is 
either a JSON array or a single object (i.e. well-formed JSON). If instead we 
simply "jump into" JSON arrays and treat each element as a record, then the 
existing parser will handle other non-well-formed formats such as 
object-per-line. It will also handle other weird cases such as an array 
followed by whitespace followed by a JSON object; the rule of thumb will be 
that an incoming flow file "is expected to be comprised of any combination of 
JSON arrays and objects separated by optional whitespace".

For the writer, we can offer a property for "Output Grouping" that defaults to 
"Array" (to maintain current behavior of outputting JSON records as a JSON 
array) and also offers "One Object Per Line". From an implementation 
standpoint, we can use a MinimalPrettyPrinter with the record separator being a 
newline for that case. Also we would not allow the Pretty Print property to be 
set to "true" if "One Object Per Line" was selected.

> Update JSON Record Reader / Writer to allow for 'json per line' format
> ----------------------------------------------------------------------
>
>                 Key: NIFI-4456
>                 URL: https://issues.apache.org/jira/browse/NIFI-4456
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Matt Burgess
>            Priority: Major
>
> It is common, especially for archiving purposes, to have many JSON objects 
> combined with new-lines in between, in order to delimit the records. It would 
> be useful to allow record readers and writers to support this, instead of 
> requiring that JSON records being elements in a JSON Array.
> For example, the following JSON Is considered two records:
> {code}
> [
>   { "greeting" : "hello", "id" : 1 },
>   { "greeting" : "good-bye", "id" : 2 }
> ]
> {code}
> It would be beneficial to also support the format:
> {code}
> { "greeting" : "hello", "id" : 1 }
> { "greeting" : "good-bye", "id" : 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to