[ 
https://issues.apache.org/jira/browse/NIFI-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard resolved NIFI-7495.
----------------------------------
    Resolution: Feedback Received

Apache NiFi 1.x is no longer maintained and no new release is planned on the 
1.x release line. Marking as resolved as part of a cleanup operation. Please 
open a new one with an updated description if this is still relevant for NiFi 
2.x.

> PutParquet processor generating invalid files - Can not read value at 0 in 
> block -1 in file - Encoding DELTA_BINARY_PACKED is only supported for type 
> INT32
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-7495
>                 URL: https://issues.apache.org/jira/browse/NIFI-7495
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.11.4
>         Environment: Red Hat Enterprise Linux Server 7.6 (Maipo)
>            Reporter: Henrique Neves do Nascimento
>            Priority: Major
>         Attachments: HIVE-stacktrace.txt
>
>
> PutParquet processor generates invalid parquet files when the flow file has 
> few records.
>  
> When the flow file has only header + 1-3 records, the PutParquet succeeds, 
> the file is written in HDFS, but it is invalid. But when the flow file has a 
> lot of records, the PutParquet processor also succeeds and it is possible to 
> read the generated files.
> I tried to open the invalid parquet files using parquet-tools, hive and 
> pyspark, and all of them fails with the same error: “Can not read value at 0 
> in block -1 in file”.
> Hive also shows me this error in the log file: Caused by: 
> parquet.io.ParquetDecodingException: Encoding DELTA_BINARY_PACKED is only 
> supported for type INT32.
>  
> To reproduce the problem, i used a GetFile processor + PutParquet writing in 
> HDFS, NIFI version 1.11.4
>  
> Here is an example of the content of a file that is created, but invalid (i 
> changed some chars):
>  
> timestamp,ggsn,apn,msisdn,statustype,ip,sessionid,duration
> 1589236199000,186.4.75.1,webapn.company.com,44895956521,Start,177945774,979cdf6b021ed038,-1,
>  
> And an example of a success case:
>  
> timestamp,ggsn,apn,msisdn,statustype,ip,sessionid,duration
> 1589569200000,186.6.64.1,webapn.company.com,12395856026,Start,176224166,989dhe2808a0e10c,-1,
> 1589569200000,186.6.96.1,webapn.company.com,12393446203,Stop,177119485,989dhe6904515cf7,3712000,
> 1589569200000,186.6.0.3,webapn.company.com,12394359006,Stop,-1407442482,989dhe0f010282f1,7092000,
> 1589569200000,186.6.96.1,webapn.company.com,12394427751,Start,177550761,989dhe6904dd35df,-1,
> 1589569200000,186.6.64.1,webapn.company.com,12393309416,Start,176616344,989dhe2703f93f8a,-1,
> 1589569200000,186.6.0.3,webapn.company.com,12394355488,Start,176177290,989dhe10505a9af1,-1,
> 1589569200000,186.6.64.1,webapn.company.com,12395478656,Start,176688933,989dhe2703f93f8b,-1,
> 1589569200000,186.6.96.1,webapn.company.com,12395214244,Start,172288204,989dhe6900c48aa7,-1,
> 1589569200000,186.6.64.1,webapn.company.com,12393418526,Stop,176335286,989dhe27081d0fa1,50000,
> 1589569200000,186.6.96.1,webapn.company.com,12394828264,Start,177952229,989dhe6900c48aa8,-1,
> 1589569200000,152.146.0.1,webapn.company.com,12394416031,Stop,-1405606344,989dhe49ccja1399,58000,
> 1589569200000,186.6.96.1,webapn.company.com,12394589217,Start,177743029,989dhe6a04ee2123,-1,
> 1589569200000,152.146.0.1,webapn.company.com,12394859666,Start,-1407233995,989dhe4916be3ee9,-1,
> 1589569200000,152.146.0.1,webapn.company.com,12393735602,Stop,-1407845029,c83b809dde72f30a,402000,
>  
> My PutParquet is configured to write files UNCOMPRESSED, version PARQUET_2_0, 
> and TRUE for avro configs. He is also using a CSVReader as record reader, 
> with this schema:
> {
> "namespace": "nifi",
> "name": "logs_radius",
> "type": "record",
> "fields": [
>   \{ "name": "timestamp", "type": "long" },
>   \{ "name": "ggsn", "type": "string" },
>   \{ "name": "apn", "type": "string" },
>   \{ "name": "msisdn", "type": "string" },
>   \{ "name": "statustype", "type": "string" },
>   \{ "name": "ip", "type": "int" },
>   \{ "name": "sessionid", "type": "string" },
>   \{ "name": "duration", "type": "long" }
> ]
> }
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to