Chris Sampson created NIFI-7194:
-----------------------------------

             Summary: Record Path (and Readers) should be able to select/filter 
on fields within arrays
                 Key: NIFI-7194
                 URL: https://issues.apache.org/jira/browse/NIFI-7194
             Project: Apache NiFi
          Issue Type: Improvement
    Affects Versions: 1.11.1
            Reporter: Chris Sampson


Given a sample input like

{code:json}
{
        "field": "value",
        "a_date": "2020-02-01",
        "ids": [
                {
                        "id": "1",
                        "id_date": "2020-02-24"
                },
                {
                        "id": "2",
                        "id_date": "2020-02-23"
                }
        ],
        "another": {
                "some_date": "2020-02-02"
        }
}
{code}

It would be useful if one could use NiFi's Record Path to generically reference 
fields nested within arrays as well as top-level/nested objects in order to 
make updates to their values, e.g. as part of a call to the UpdateRecord 
processor.

For example, to reference all non-array fields whose name contains `date`, one 
could currently use: ` //*[matchesRegex(fieldName(.), '(^|.+_)date(_.+|$)')]`; 
however there is no equivalent for referencing the date fields within the 
arrays.

Such a RecordPath could look like `//[0..-1]/*[matchesRegex(fieldName(.), 
'(^|.+_)date(_.+|$)')]`, however this is currently marked as invalid by the 
UpdateRecord processor validation (and presumably therefore doesn't work). One 
tricky addition to this would be how to handle multiple levels of nesting of 
arrays/objects and how much descending through the Record tree should be 
possible.

An addition to this would be to allow for matching of Map fields too, e.g. with 
something like `//[]/*[matchesRegex(fieldName(.), '(^|.+_)date(_.+|$)')]`.


An example use-case for this is where incoming data structure is not set/known 
and may contain arrays (or even maps) with date fields where the flow is trying 
to support multiple date formats (e.g. yyyy-MM-dd and dd/MM/yyyy) but wants to 
harmonise all dates to a single format for further processing (e.g. dd/MM/yyyy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to