Yuanhao Zhu created NIFI-14496:
----------------------------------
Summary: ConvertRecord processor cannot convert Avro bytes typed
field to string properly
Key: NIFI-14496
URL: https://issues.apache.org/jira/browse/NIFI-14496
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 2.3.0, 2.2.0, 2.1.0, 2.0.0
Reporter: Yuanhao Zhu
When using ConvertRecord processor in 2.x we found that it is not able to
convert an avro bytes field into string properly.
The setup is as following, the ConvertRecord uses an avro reader which uses the
built-in schema from the avro file. the record writer is a JsonRecordSetWriter
which uses a custom schema(copied from the avro file's schema only that the
"Body" field is marked as string(in avro file ":Body" field is marked as bytes
in the built-in schema)
In 1.x the "Body" field will be converted into string that contains json
objects and we would use evaluateJsonPath to extract further. However, in 2.x
this behavior becomes that the result of "Body" field would always be something
like "[Ljava.lang.Object;@279aa943" which is the toString returned value from
an Object array
After some investigation in nifi repo, I think the reason is that In 1.x
DataTypeUtils conversion, the toString method also deals with the scenario
where incoming value is an array of object,
[https://github.com/apache/nifi/blob/883338fe28883733417d10f6ffa9319e75f5ea06/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/util/DataTypeUtils.java#L975]
where it will convert each of the object into a string. While in the 2.x, where
the conversion is moved to ObjectStringFieldConverter.java,
[https://github.com/apache/nifi/blob/0fde8be07270e41433d07fa1e3f940b1a08674d9/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/field/ObjectStringFieldConverter.java#L102]
this scenario is not covered and instead the default toString method of the
incoming object will be invoked which also explained why we see that
"[Ljava.lang.Object;@279aa943" in 2.x .
Not sure why the Avro reader reads the byte array in as an Object array though.
Would you mind take a look into it? Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)