[
https://issues.apache.org/jira/browse/NIFI-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne resolved NIFI-7940.
------------------------------
Resolution: Duplicate
> Create a ScriptedPartitionRecord
> --------------------------------
>
> Key: NIFI-7940
> URL: https://issues.apache.org/jira/browse/NIFI-7940
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Mark Payne
> Priority: Major
>
> In 1.12.0, we introduced the ScriptedTransformRecord. This has worked very
> well for many different transformations that are simple in code but very
> difficult with the DSL's that NiFi supports. In addition to transforming
> records, another use case that can be made dramatically easier with scripting
> is partitioning records.
> The PartitionRecord processor is very powerful and easy to use, but
> RecordPath is somewhat limited in the functions that it provides. For
> example, recently in the Apache Slack channel, we had someone asking about
> how to route data based on whether or not the "timestamp" field matches a
> given regular expression. Attempts were made using QueryRecord with RPATH but
> that didn't work because the timestamp field is a top-level field, not a
> Record. Tried using UpdateRecord but that failed because the matchesRegex
> function of RecordPath is a predicate so can't be used to partition on.
> Eventually a pattern was found with QueryRecord using the `SIMILAR TO` but
> that function does not support for regular expressions.
> A Scripted processor would likely make this far more trivial to handle. This
> processor should be focused around making it dead simple to partition (and
> subsequently route based on added attributes) records with a scripting
> language. So, it will be important, like ScriptedTransformRecord, to make the
> processor geared more toward ease of use than being given the full power of
> FlowFiles, sessions, etc.
> The script should have the same bindings as ScriptedTransformRecord:
> * attributes
> * log
> * record
> * recordIndex
> Unlike ScriptedTransformRecord, though, the ScriptedPartitionRecord should
> return one of three things:
> * A string (or a primitive value such as an int, that can be turned into a
> String) representing the partition for the Record
> * A collection of strings/primitives representing multiple partitions that
> the Record should go to (indicating that the Record should be added to
> multiple Record Writers)
> * A null value or an empty collection indicating that the Record should be
> dropped.
> The processor should then write the Record to a FlowFile for each of the
> Partitions returned. For each outbound FlowFile, an attribute should be added
> indicating the partition for that FlowFile for easy follow-on routing via
> RouteOnAttribute, etc.
> The processor should keep a counter for how many Records were dropped.
> The processor should keep counters for how many Records were routed to each
> Partition.
> The processor should include additionalDetails.html to provide sufficient
> documentation. Similar to ScriptedTransformRecord, the documentation should
> include several examples, spanning at least Groovy and Python (since those
> are by far the most often used languages we see used in script processors).
> Each example provided in the additionalDetails.html should also have an
> accompanying unit test to verify the behavior.
> Given the similarities to the ScriptedTransformRecord processor, there's a
> high likelihood that the ScriptedTransformRecord processor could be
> refactored into an AbstractScriptedRecord processor with both
> ScriptedTransformRecord and ScriptedPartitionRecord extending from it.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)