Chris Sampson created NIFI-7194:
-----------------------------------
Summary: Record Path (and Readers) should be able to select/filter
on fields within arrays
Key: NIFI-7194
URL: https://issues.apache.org/jira/browse/NIFI-7194
Project: Apache NiFi
Issue Type: Improvement
Affects Versions: 1.11.1
Reporter: Chris Sampson
Given a sample input like
{code:json}
{
"field": "value",
"a_date": "2020-02-01",
"ids": [
{
"id": "1",
"id_date": "2020-02-24"
},
{
"id": "2",
"id_date": "2020-02-23"
}
],
"another": {
"some_date": "2020-02-02"
}
}
{code}
It would be useful if one could use NiFi's Record Path to generically reference
fields nested within arrays as well as top-level/nested objects in order to
make updates to their values, e.g. as part of a call to the UpdateRecord
processor.
For example, to reference all non-array fields whose name contains `date`, one
could currently use: ` //*[matchesRegex(fieldName(.), '(^|.+_)date(_.+|$)')]`;
however there is no equivalent for referencing the date fields within the
arrays.
Such a RecordPath could look like `//[0..-1]/*[matchesRegex(fieldName(.),
'(^|.+_)date(_.+|$)')]`, however this is currently marked as invalid by the
UpdateRecord processor validation (and presumably therefore doesn't work). One
tricky addition to this would be how to handle multiple levels of nesting of
arrays/objects and how much descending through the Record tree should be
possible.
An addition to this would be to allow for matching of Map fields too, e.g. with
something like `//[]/*[matchesRegex(fieldName(.), '(^|.+_)date(_.+|$)')]`.
An example use-case for this is where incoming data structure is not set/known
and may contain arrays (or even maps) with date fields where the flow is trying
to support multiple date formats (e.g. yyyy-MM-dd and dd/MM/yyyy) but wants to
harmonise all dates to a single format for further processing (e.g. dd/MM/yyyy).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)