[jira] [Created] (NIFI-8292) ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter

Nikolay Nikolaev (Jira) Thu, 04 Mar 2021 00:59:04 -0800

Nikolay Nikolaev created NIFI-8292:
--------------------------------------

             Summary: ParquetReader can't read FlowFile, which was written by 
ParquerRecordSetWriter
                 Key: NIFI-8292
                 URL: https://issues.apache.org/jira/browse/NIFI-8292
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.13.0, 1.11.4
         Environment: docker
            Reporter: Nikolay Nikolaev
         Attachments: Test_Parquet_Reader_Writer.xml, cut_from_nifi-app.log


h1. Steps to reproduce the bug
# Start NiFi in Docker:
{code}docker pull apache/nifi:latest
docker run -p 8083:8080 --name nifi_container_latest -v <*your path to 
logs-folder*>:/opt/nifi/nifi-current/logs -v <*your path to 
file-folder*>:/file_folder apache/nifi:latest{code}
# upload tamplate  [^Test_Parquet_Reader_Writer.xml]  (see an attach)
# create Flow from upploaded template *Test_Parquet_Reader_Writer.xml*
# enable all 4 controller services in NiFi Flow Configuration
# start flow
# get an error in "ConvertRecord(JSON_to_Parquet)" processor
# stop flow
# check *logs-folder* (see nifi-app.log) and *file_folder* (contains 
parquet-files and json-files). In nifi-app.log will bee the error like this 
(full message see in  [^cut_from_nifi-app.log] ):
{quote}2021-03-04 07:26:39,448 ERROR [Timer-Driven Process Thread-8] 
o.a.n.processors.standard.ConvertRecord 
ConvertRecord[id=35a86417-bd7c-31c2-ae9e-bf808e428b03] Failed to process 
StandardFlowFileRecord[uuid=eef69d98-1b2a-4b89-8267-0b4598e53d05,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1614842777315-1, container=default, 
section=1], offset=128, 
length=1007],offset=0,name=eef69d98-1b2a-4b89-8267-0b4598e53d05,size=1007]; 
will route to failure: org.apache.avro.SchemaParseException: Can't redefine: 
list
org.apache.avro.SchemaParseException: Can't redefine: list
        at org.apache.avro.Schema$Names.put(Schema.java:1128)
        at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
        at ...{quote}

h1. Description
This test flow generate 3 JSON's via GenerateFlowFile processor:
Simple JSON:
{code}
        { 
          "field1": "value_field",
      "feild2": "value_field2"
        }
{code}
1st JSON:
{code}
        { 
          "field1": "value_field",
          "array1": [
                {
                  "feild2": "value_field2"
                }
          ]
        }
{code}
2st JSON:
{code}
        { 
          "field": "value_field",
          "array1": [
                {
                  "array2": ["a_value_array2","b_value_array2"
                  ]
                }
          ]
        }
{code}
Then convert JSON into Parquet (via ConvertRecord(JSON_to_Parquet)) and back to 
JSON (via ConvertRecord (Parquet_to_JSON)). To facilitate analysis  JSON- and 
Parquet files are saved to the file_folder.
In the file_folder we can see, that all JSON's was seccessfull converted into 
parquet-files. But back to JSON only "Simple JSON" and "1st JSON"  was 
converted. The 2st JSON сauses an error in ConvertRecord.
So, in certain cases ParquetReader can't read file, which was created by 
ParquerRecordSetWriter, for example in case of "2st JSON"(which has more 
complex nesting structure).

This bug is reproduced in the version 1.11.4 and 1.13.0. 
In version 1.12.1 I couldn't reproduce it because of NIFI-7817



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (NIFI-8292) ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter

Reply via email to