[ 
https://issues.apache.org/jira/browse/NIFI-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653323#comment-17653323
 ] 

Chris Sampson edited comment on NIFI-9206 at 1/1/23 3:33 PM:
-------------------------------------------------------------

Rebased the original PR(s) against latest {{main}} and addressed previous 
comments.

Attached Flow definition ( [^NIFI-9206.json] ) includes an 
{{AvroSchemaRegistry}} containing the same Record Schemas as those used in the 
{{TestRemoveRecordField}} unit tests so people can test the new processor in a 
running NiFi instance.


was (Author: chris s):
Rebased the original PR(s) against latest {{main}} and addressed previous 
comments.

Attached Flow definition includes an {{AvroSchemaRegistry}} containing the same 
Record Schemas as those used in the {{TestRemoveRecordField}} unit tests so 
people can test the new processor in a running NiFi instance.

> Create a processor that is capable of removing fields from records
> ------------------------------------------------------------------
>
>                 Key: NIFI-9206
>                 URL: https://issues.apache.org/jira/browse/NIFI-9206
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Peter Gyori
>            Assignee: Chris Sampson
>            Priority: Major
>         Attachments: NIFI-9206.json
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> A processor should be created that is capable of removing fields from records 
> (RemoveRecordField might be a name for it).
> The processor should have 3 properties:
>  * Record Reader (a Reader controller service could be specified)
>  * Record Writer (a Writer controller service could be specified)
>  * Field To Remove (expects a RecordPath that points to the field to be 
> removed)
> The processor should be able to accept additional dynamic properties that 
> specify further fields (by RecordPath) to be removed from the record.
> +*Example*+
> +input:+
> {code:java}
> {
>     "id": 1,
>     "name": "John",
>     "address": {
>         "zip": 1111,
>         "street": "Main",
>         "building": 11
>     }
> }
> {code}
> +Field to remove:+ /address/building
> +output:+
> {code:java}
> {
>     "id": 1,
>     "name": "John",
>     "address": {
>         "zip": 1111,
>         "street": "Main"
>     }
> }
> {code}
> The record's schema should be modified accordingly (removing the 
> /address/building field from the schema). Field removal should be permitted 
> regardless of the field being nullable or not.
> Generally, the removal of a field should include the field's removal from the 
> schema AND the data. The exception is if the removal is data-dependent (the 
> field should be removed if its value equals "xyz"). In this case no schema 
> modification should occur.
> The processor should be able to remove one or more elements from arrays 
> (e.g.: /addresses[ 1 ] shoud remove the element from the addresses array from 
> the 1st position). When removing a field from elements of an array, the 
> array's schema should only be modified if the removal is applied to all 
> elements of the array (i.e.: /addresses[ * ]/building should modify the 
> schema of the array, but /addresses[ 1 ]/building should not).
> The same rule should be applied when handling Map datatype.
> If the record does not contain the field that is expected to be removed, the 
> record should still be transferred to the 'success' relationship, with no 
> modification. The expectation is that if /x/y is expected to be removed from 
> the record, then the record leaving the processor should not contain /x/y 
> field.
> If a certain field can be of different types (e.g. the address field can be 
> "string" as well as "record" and possibly another "record" with a schema 
> different from the former record type) then if /address/building is expected 
> to be removed from the record, the processor is expected to remove the 
> building field from the schema of all the possible types of the address field 
> regardless of the address field being whatever concrete type in the 
> particular record that is being processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to