[
https://issues.apache.org/jira/browse/NIFI-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325404#comment-14325404
]
ASF GitHub Bot commented on NIFI-360:
-------------------------------------
Github user joewitt commented on the pull request:
https://github.com/apache/incubator-nifi/pull/28#issuecomment-74810190
Review Findings:
- Adjusted the LICENSE/NOTICE information for Standard Nar, NiFi source
root, and NiFi assembly.
- Can we cache the compiled JsonPaths [they are threadsafe, etc..]?
- If so whenever onPropertyModified is triggers we know to invalidate the
cache. Can let it get refreshed during the next onTrigger.
- EvaluateJsonPath: Instead of pulling some set batch size use the
@SupportsBatching annotation and the framework can take care of this for you
and will do so in a manner that favors user defined tolerances for latency vs
throughput. In addition this removes the need for the for loop, the
flowfileloop goto, etc.. But it does make the previous suggestion more
important (caching compiled JsonPaths)
- Another contributor has provided an auto-documentation mechanism. Not
sure of its status for being pulled in just yet though. In the mean time please
consider adding documentation pages in src/main/resources/docs....
- EvaluateJsonPath: Line 207. Is exception handling the only possible
option for handling/detecting this? Exceptions create a shocking amount of
stress on GC and overall JVM performance at rate. Having exception handling be
key to the flow control will greatly limit the applicability of this processor
for truly high volume processing. If there is no alternative it is probably
worth documenting this fact as a caveat for users to know in the docs for the
processor. If you'd like to see just how horrible exceptions are for
throughput of the JVM do a simple test of Integer.parseInt to parse a String
(when some of the inputs could be non-Integers) vs a regex to test first if it
appears to be an Int then using Integer.parseInt.
- EvaluateJsonPath: Consider telling the provenance event reporter if you
modified the content of the flow file and providing some details about the
change.
- SplitJson: Exception based handling again to be avoided if possible.
- Both Processors: It appears to be that the library will load the full
input stream in memory then return results, splits, etc.. If so that provides
important information about its applicability on large JSON documents. If so
please document this concern with the processor so users can consider this and
its impact to the heap/memory space.
- SplitJson: Consider telling the provenance event reporter that the splits
are forks from the original.
Summary: Really good stuff here. This will be a very helpful processor.
Please consider making the suggested modifications. I will push to origin this
new branch NIFI-360. Please modify off that and we can work from that PR.
Thanks!
> Create Processors to work against JSON data
> -------------------------------------------
>
> Key: NIFI-360
> URL: https://issues.apache.org/jira/browse/NIFI-360
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Aldrin Piri
> Assignee: Aldrin Piri
> Priority: Minor
> Labels: processor
>
> I have created two Processors, EvaluateJsonPath and SplitJson which are
> analogs of the functionality provided through EvaluateXPath and SpiltXML.
> Both are powered primarily around the usage of [JsonPath by
> Jayway|https://github.com/jayway/JsonPath].
> Their capability descriptions are provided below:
> {panel:title= EvaluateJsonPath}
> Evaluates one or more JsonPath expressions against the content of a FlowFile.
> The results of those expressions are assigned to FlowFile Attributes or are
> written to the content of the FlowFile itself, depending on configuration of
> the Processor. JsonPaths are entered by adding user-defined properties; the
> name of the property maps to the Attribute Name into which the result will be
> placed (if the Destination is flowfile-attribute; otherwise, the property
> name is ignored).
> The value of the property must be a valid JsonPath expression. If the
> JsonPath evaluates to a JSON array or JSON object and the Return Type is set
> to 'scalar' the FlowFile will be unmodified and will be routed to failure. A
> Return Type of JSON can return scalar values if the provided JsonPath
> evaluates to the specified value and will be routed as a match. If
> Destination is 'flowfile-content' and the JsonPath does not evaluate to a
> defined path, the FlowFile will be routed to 'unmatched' without having its
> contents modified. If Destination is flowfile-attribute and the expression
> matches nothing, attributes will be created with empty strings as the value,
> and the FlowFile will always be routed to 'matched.'
> {panel}
> {panel:title=SplitJson}
> Splits a JSON File into multiple, separate FlowFiles for an array element
> specified by a JsonPath expression. Each generated FlowFile is comprised of
> an element of the specified array and transferred to relationship 'split,
> with the original file transferred to the 'original' relationship. If the
> specified JsonPath is not found or does not evaluate to an array element,
> the original file is routed to 'failure' and no files are generated.
> {panel}
> One item of note is the transitive dependency of ASM through Json-Smart
> through JsonPath.
> I have included, what I believe is needed to appropriately make use of this
> item in the LICENSE. Review of its correctness would is requested.
> Any feedback is appreciated. Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)