[jira] [Comment Edited] (NIFI-14331) Unknown embedded fields not dropped by JSON Writer as expected by specified schema

Daniel Stieglitz (Jira) Wed, 05 Mar 2025 14:45:06 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932772#comment-17932772
 ]


Daniel Stieglitz edited comment on NIFI-14331 at 3/5/25 10:44 PM:
------------------------------------------------------------------

It looks like the main issue is that in  method convertJsonNodeToRecord of 
JsonTreeRowRecordReader with signature
{code:java}
private Record convertJsonNodeToRecord(final JsonNode jsonNode, final 
RecordSchema schema, final String fieldNamePrefix,
                                           final boolean coerceTypes, final 
boolean dropUnknown) throws IOException, MalformedRecordException{code}
the variable jsonNodeForSerialization (which is of type JsonNode) is not 
updated accordingly when recursively climbing the JSON while the record data 
stored in variable values ( of type Map<String, Object>) has the correct 
filtered data. Hence at the end of the method the JSON created with the 
following lines:
{code:java}
 final Supplier<String> supplier = jsonNodeForSerialization::toString;
 return new MapRecord(schema, values, SerializedForm.of(supplier, 
"application/json"), false, dropUnknown);{code}
has the incorrect serialized form of the JSON.

[~pvillard] [~exceptionfactory] how critical is it to have the serialized form 
of the JSON in the returned MapRecord? When not dropping field then I agree it 
makes sense to have it but when dropping fields as I pointed out the serialized 
is not correct. In a test I modified this method to return
{code:java}
return new MapRecord(schema, values, null, false, dropUnknown);{code}
when fields are dropped and 
{code:java}
return new MapRecord(schema, values, SerializedForm.of(supplier, 
"application/json"), false, dropUnknown);{code}
when fields are not dropped and I get the correct JSON. Is that okay?


was (Author: JIRAUSER294662):
It looks like the main issue is that in  method convertJsonNodeToRecord of 
JsonTreeRowRecordReader with signature
{code:java}
private Record convertJsonNodeToRecord(final JsonNode jsonNode, final 
RecordSchema schema, final String fieldNamePrefix,
                                           final boolean coerceTypes, final 
boolean dropUnknown) throws IOException, MalformedRecordException{code}
the variable jsonNodeForSerialization (which is of type JsonNode) is not 
updated accordingly when recursively climbing the JSON while the record data 
stored in variable values ( of type Map<String, Object>) has the correct 
filtered data. Hence at the end of the method the JSON created with the 
following lines:
{code:java}
 final Supplier<String> supplier = jsonNodeForSerialization::toString;
 return new MapRecord(schema, values, SerializedForm.of(supplier, 
"application/json"), false, dropUnknown);{code}
has the incorrect serialized form of the JSON.

> Unknown embedded fields not dropped by JSON Writer as expected by specified 
> schema
> ----------------------------------------------------------------------------------
>
>                 Key: NIFI-14331
>                 URL: https://issues.apache.org/jira/browse/NIFI-14331
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Daniel Stieglitz
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: convertRecordResults.json, person.avsc, 
> person_dropfield.json
>
>
> NIFI-13843 was aimed to eliminate any fields found in the JSON which were not 
> defined in a specifed Avro schema. While that fix seems to have solved the 
> issue for top level items it did not solve the issue for an undefined key 
> within a defined object and for an undefined key  in a defined object for an 
> array. Attached are the person.avsc Avro schema and the person_dropfield.json 
> which includes undefined top level fields such as single key value pair 
> ("undefinedKey"), array ("undefinedScalarArray"), object ("undefinedObject") 
> and object array ("undefinedObjectArray"). It also includes undefined field 
> ("undefinedKeyInObject") inside the defined "name" top level object and an 
> undefined field ("undefinedKeyInObject") in a "job" object found in the 
> "jobs" array. The result after calling ConvertRecord can be seen in the 
> attached convertRecordResults.json. Note fields "undefinedKey", 
> "undefinedScalarArray", "undefinedObject" and "undefinedObjectArray" all get 
> dropped while fields "undefinedKeyInObject" still exist in the "name" object 
> and the "job" object inside the "jobs" array.
> Currently this behavior is seen in both ConvertRecord and MergeRecord when 
> both are configured with a JsonTreeReader and JsonRecordSetWriter.
> It is interesting to note this behavior is seen in NIFI 1.28.1 only for 
> MergeRecord while ConvertRecord drops all unknown fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (NIFI-14331) Unknown embedded fields not dropped by JSON Writer as expected by specified schema

Reply via email to