arenger commented on issue #3414: NIFI-5900 Add a SplitLargeJson processor
URL: https://github.com/apache/nifi/pull/3414#issuecomment-485135173
 
 
   No problem on delays, I hope you had a good vacation.  As for additional 
detail to the html documentation you mentioned: Yes I could do that, but I 
don't know where to add it.  Did you take a look at the 
`@CapabilityDescription` that I added for `SplitLargeJson`?  I tried to make 
that wording clear but also succinct.  I could add more detail there, or is 
there a better place where I should expound on the function of the processor?
   
   Also, as I mentioned in [an above 
comment](https://github.com/apache/nifi/pull/3414#issuecomment-482096603), I 
think there are four roads we could take from here:
   
   1) Create a new `SplitJsonProcessor` that uses `javax.json` (this PR)
   2) Create a new `SplitJsonProcessor` that uses `JsonSurfer`
   3) Keep only `SplitJson` and optionally employ a streaming approach, backed 
by `javax.json`, when a new property is set
   4) Keep only `SplitJson` and optionally employ a streaming approach, backed 
by `JsonSurfer`, when a new property is set
   
   I looked briefly into the 2nd and 4th option but have yet to confirm whether 
the memory usage is comparable.  In order to use `JsonSurfer` in NiFi it looks 
like we'd need to suppress the version of ANTLR that is pulled from the 
[nifi-syslog-utils](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-syslog-utils/pom.xml#L28)
 module (via `simple-syslog-5424`) and explicitly replace it with `4.7.2` of 
`antlr4-runtime`.  After I did that, I was able to run `JsonSurfer` without a 
runtime error.
   
   `JsonSurfer` does have wider support for the JSON Path specification.  If we 
went that route, I'd suggest we create a new processor called "JsonExtract", or 
something, that would simply receive a JSON file and a JSON Path.  It would 
output zero, one, or more JSON documents from the incoming document.  The 
notion of "splitting" isn't really the best description at that point, since 
the full JSON Path specification can be used to specify any part -- or set of 
parts -- of a JSON document.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to