[ 
https://issues.apache.org/jira/browse/NIFI-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325404#comment-14325404
 ] 

ASF GitHub Bot commented on NIFI-360:
-------------------------------------

Github user joewitt commented on the pull request:

    https://github.com/apache/incubator-nifi/pull/28#issuecomment-74810190
  
    Review Findings:
    
    - Adjusted the LICENSE/NOTICE information for Standard Nar, NiFi source 
root, and NiFi assembly.
    
    - Can we cache the compiled JsonPaths [they are threadsafe, etc..]?
      - If so whenever onPropertyModified is triggers we know to invalidate the 
cache.  Can let it get refreshed during the next onTrigger.
      
    - EvaluateJsonPath: Instead of pulling some set batch size use the 
@SupportsBatching annotation and the framework can take care of this for you 
and will do so in a manner that favors user defined tolerances for latency vs 
throughput.  In addition this removes the need for the for loop, the 
flowfileloop goto, etc..  But it does make the previous suggestion more 
important (caching compiled JsonPaths)
    
    - Another contributor has provided an auto-documentation mechanism.  Not 
sure of its status for being pulled in just yet though. In the mean time please 
consider adding documentation pages in src/main/resources/docs....
    
    - EvaluateJsonPath: Line 207.  Is exception handling the only possible 
option for handling/detecting this?  Exceptions create a shocking amount of 
stress on GC and overall JVM performance at rate.  Having exception handling be 
key to the flow control will greatly limit the applicability of this processor 
for truly high volume processing.  If there is no alternative it is probably 
worth documenting this fact as a caveat for users to know in the docs for the 
processor.  If you'd like to see just how horrible exceptions are for 
throughput of the JVM do a simple test of Integer.parseInt to parse a String 
(when some of the inputs could be non-Integers) vs a regex to test first if it 
appears to be an Int then using Integer.parseInt.
    
    - EvaluateJsonPath: Consider telling the provenance event reporter if you 
modified the content of the flow file and providing some details about the 
change.
    
    - SplitJson: Exception based handling again to be avoided if possible.
    
    - Both Processors: It appears to be that the library will load the full 
input stream in memory then return results, splits, etc..  If so that provides 
important information about its applicability on large JSON documents.  If so 
please document this concern with the processor so users can consider this and 
its impact to the heap/memory space.
    
    - SplitJson: Consider telling the provenance event reporter that the splits 
are forks from the original.
    
    Summary: Really good stuff here.  This will be a very helpful processor.  
Please consider making the suggested modifications.  I will push to origin this 
new branch NIFI-360.  Please modify off that and we can work from that PR.  
Thanks!


> Create Processors to work against JSON data
> -------------------------------------------
>
>                 Key: NIFI-360
>                 URL: https://issues.apache.org/jira/browse/NIFI-360
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Aldrin Piri
>            Assignee: Aldrin Piri
>            Priority: Minor
>              Labels: processor
>
> I have created two Processors, EvaluateJsonPath and SplitJson which are 
> analogs of the functionality provided through EvaluateXPath and SpiltXML.
> Both are powered primarily around the usage of [JsonPath by 
> Jayway|https://github.com/jayway/JsonPath].
> Their capability descriptions are provided below:
> {panel:title= EvaluateJsonPath}
> Evaluates one or more JsonPath expressions against the content of a FlowFile. 
>  The results of those expressions are assigned to FlowFile Attributes or are 
> written to the content of the FlowFile itself, depending on configuration of 
> the Processor. JsonPaths are entered by adding user-defined properties; the 
> name of the property maps to the Attribute Name into which the result will be 
> placed (if the Destination is flowfile-attribute; otherwise, the property 
> name is ignored). 
> The value of the property must be a valid JsonPath expression. If the 
> JsonPath evaluates to a JSON array or JSON object and the Return Type is set 
> to 'scalar' the FlowFile will be unmodified and will be routed to failure. A 
> Return Type of JSON can return scalar values if the provided JsonPath 
> evaluates to the specified value and will be routed as a match. If 
> Destination is 'flowfile-content' and the JsonPath does not evaluate to a 
> defined path, the FlowFile will be routed to 'unmatched' without having its 
> contents modified. If Destination is flowfile-attribute and the expression 
> matches nothing, attributes will be created with empty strings as the value, 
> and the FlowFile will always be routed to 'matched.'
> {panel}
> {panel:title=SplitJson}
> Splits a JSON File into multiple, separate FlowFiles for an array element 
> specified by a JsonPath expression. Each generated FlowFile is comprised of 
> an element of the specified array and transferred to relationship 'split, 
> with the original file transferred to the 'original' relationship. If the 
> specified JsonPath is not found or  does not evaluate to an array element, 
> the original file is routed to 'failure' and no files are generated.
> {panel}
> One item of note is the transitive dependency of ASM through Json-Smart 
> through JsonPath.
> I have included, what I believe is needed to appropriately make use of this 
> item in the LICENSE.  Review of its correctness would is requested.
> Any feedback is appreciated.  Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to