Nikolay Nikolaev created NIFI-8292:
--------------------------------------
Summary: ParquetReader can't read FlowFile, which was written by
ParquerRecordSetWriter
Key: NIFI-8292
URL: https://issues.apache.org/jira/browse/NIFI-8292
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.13.0, 1.11.4
Environment: docker
Reporter: Nikolay Nikolaev
Attachments: Test_Parquet_Reader_Writer.xml, cut_from_nifi-app.log
h1. Steps to reproduce the bug
# Start NiFi in Docker:
{code}docker pull apache/nifi:latest
docker run -p 8083:8080 --name nifi_container_latest -v <*your path to
logs-folder*>:/opt/nifi/nifi-current/logs -v <*your path to
file-folder*>:/file_folder apache/nifi:latest{code}
# upload tamplate [^Test_Parquet_Reader_Writer.xml] (see an attach)
# create Flow from upploaded template *Test_Parquet_Reader_Writer.xml*
# enable all 4 controller services in NiFi Flow Configuration
# start flow
# get an error in "ConvertRecord(JSON_to_Parquet)" processor
# stop flow
# check *logs-folder* (see nifi-app.log) and *file_folder* (contains
parquet-files and json-files). In nifi-app.log will bee the error like this
(full message see in [^cut_from_nifi-app.log] ):
{quote}2021-03-04 07:26:39,448 ERROR [Timer-Driven Process Thread-8]
o.a.n.processors.standard.ConvertRecord
ConvertRecord[id=35a86417-bd7c-31c2-ae9e-bf808e428b03] Failed to process
StandardFlowFileRecord[uuid=eef69d98-1b2a-4b89-8267-0b4598e53d05,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1614842777315-1, container=default,
section=1], offset=128,
length=1007],offset=0,name=eef69d98-1b2a-4b89-8267-0b4598e53d05,size=1007];
will route to failure: org.apache.avro.SchemaParseException: Can't redefine:
list
org.apache.avro.SchemaParseException: Can't redefine: list
at org.apache.avro.Schema$Names.put(Schema.java:1128)
at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
at ...{quote}
h1. Description
This test flow generate 3 JSON's via GenerateFlowFile processor:
Simple JSON:
{code}
{
"field1": "value_field",
"feild2": "value_field2"
}
{code}
1st JSON:
{code}
{
"field1": "value_field",
"array1": [
{
"feild2": "value_field2"
}
]
}
{code}
2st JSON:
{code}
{
"field": "value_field",
"array1": [
{
"array2": ["a_value_array2","b_value_array2"
]
}
]
}
{code}
Then convert JSON into Parquet (via ConvertRecord(JSON_to_Parquet)) and back to
JSON (via ConvertRecord (Parquet_to_JSON)). To facilitate analysis JSON- and
Parquet files are saved to the file_folder.
In the file_folder we can see, that all JSON's was seccessfull converted into
parquet-files. But back to JSON only "Simple JSON" and "1st JSON" was
converted. The 2st JSON сauses an error in ConvertRecord.
So, in certain cases ParquetReader can't read file, which was created by
ParquerRecordSetWriter, for example in case of "2st JSON"(which has more
complex nesting structure).
This bug is reproduced in the version 1.11.4 and 1.13.0.
In version 1.12.1 I couldn't reproduce it because of NIFI-7817
--
This message was sent by Atlassian Jira
(v8.3.4#803005)