[
https://issues.apache.org/jira/browse/NIFI-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325186#comment-17325186
]
ASF subversion and git services commented on NIFI-8365:
-------------------------------------------------------
Commit a50957161cef12a63a1ff76bcaf718ecab2e71b5 in nifi's branch
refs/heads/main from Tamas Palfy
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a509571 ]
NIFI-8365 Fix JSON AbstractJsonRowRecordReader to handle deep CHOICE-typed
records properly: change the logic that selects the first compatible schema
which can have missing fields compared to the real value and search for a more
strict match first and fallback to the existing logic only if not one found.
- AbstractJsonRowRecordReader - Handle (meaning log a warning and not fail
completely) multi-array CHOICE type when data has extra fields (not defined by
the schema) and can't determine correct type.
- AvroTypeUtil - Allow multiple different record types in avro union type.
Minor refactors. Added documentation fro EqualsWrapper.
> JSON record reader mishandles deep CHOICE types
> -----------------------------------------------
>
> Key: NIFI-8365
> URL: https://issues.apache.org/jira/browse/NIFI-8365
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Tamas Palfy
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> The AbstractJsonRowRecordReader when trying to find the correct schema for a
> given record it may come with a wrong one.
> For example:
> Suppose the following record:
> {code:json}
> {
> "dataCollection":[
> {
> "record": {
> "integer": 1,
> "boolean": true
> }
> },
> {
> "record": {
> "integer": 2,
> "string": "stringValue2"
> }
> }
> ]
> }
> {code}
> Even if the schema is correctly set (which may not be the case as infer
> schema itself has a similar issue),
> the second record
> {code:json}
> {
> "record": {
> "integer": 2,
> "string": "stringValue2"
> }
> }
> {code}
> will be assigned the schema of the first (["integer" : "INT", "boolean" :
> "BOOLEAN"] instead of ["integer" : "INT", "string" : "STRING"]).
> This will cause the fields that are not present in the schema (in this case
> "string") to be omitted when writing it out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)