[ 
https://issues.apache.org/jira/browse/NIFI-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne resolved NIFI-7940.
------------------------------
    Resolution: Duplicate

> Create a ScriptedPartitionRecord
> --------------------------------
>
>                 Key: NIFI-7940
>                 URL: https://issues.apache.org/jira/browse/NIFI-7940
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Mark Payne
>            Priority: Major
>
> In 1.12.0, we introduced the ScriptedTransformRecord. This has worked very 
> well for many different transformations that are simple in code but very 
> difficult with the DSL's that NiFi supports. In addition to transforming 
> records, another use case that can be made dramatically easier with scripting 
> is partitioning records.
> The PartitionRecord processor is very powerful and easy to use, but 
> RecordPath is somewhat limited in the functions that it provides. For 
> example, recently in the Apache Slack channel, we had someone asking about 
> how to route data based on whether or not the "timestamp" field matches a 
> given regular expression. Attempts were made using QueryRecord with RPATH but 
> that didn't work because the timestamp field is a top-level field, not a 
> Record. Tried using UpdateRecord but that failed because the matchesRegex 
> function of RecordPath is a predicate so can't be used to partition on. 
> Eventually a pattern was found with QueryRecord using the `SIMILAR TO` but 
> that function does not support for regular expressions.
> A Scripted processor would likely make this far more trivial to handle. This 
> processor should be focused around making it dead simple to partition (and 
> subsequently route based on added attributes) records with a scripting 
> language. So, it will be important, like ScriptedTransformRecord, to make the 
> processor geared more toward ease of use than being given the full power of 
> FlowFiles, sessions, etc.
> The script should have the same bindings as ScriptedTransformRecord:
>  * attributes
>  * log
>  * record
>  * recordIndex
> Unlike ScriptedTransformRecord, though, the ScriptedPartitionRecord should 
> return one of three things:
>  * A string (or a primitive value such as an int, that can be turned into a 
> String) representing the partition for the Record
>  * A collection of strings/primitives representing multiple partitions that 
> the Record should go to (indicating that the Record should be added to 
> multiple Record Writers)
>  * A null value or an empty collection indicating that the Record should be 
> dropped.
> The processor should then write the Record to a FlowFile for each of the 
> Partitions returned. For each outbound FlowFile, an attribute should be added 
> indicating the partition for that FlowFile for easy follow-on routing via 
> RouteOnAttribute, etc.
> The processor should keep a counter for how many Records were dropped.
> The processor should keep counters for how many Records were routed to each 
> Partition.
> The processor should include additionalDetails.html to provide sufficient 
> documentation. Similar to ScriptedTransformRecord, the documentation should 
> include several examples, spanning at least Groovy and Python (since those 
> are by far the most often used languages we see used in script processors). 
> Each example provided in the additionalDetails.html should also have an 
> accompanying unit test to verify the behavior.
> Given the similarities to the ScriptedTransformRecord processor, there's a 
> high likelihood that the ScriptedTransformRecord processor could be 
> refactored into an AbstractScriptedRecord processor with both 
> ScriptedTransformRecord and ScriptedPartitionRecord extending from it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to