[
https://issues.apache.org/jira/browse/ARROW-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oliver Klein updated ARROW-17352:
---------------------------------
Description:
Parquet files cannot be opened in Windows Parquet Viewer when stored with Arrow
Version 9.0.0. It worked when stored with version 8 and earlier.
Windows Parquet Viewer: 2.3.5 and 2.3.6
pyarrow version: 9.0.0
Error: System.AggregateException: One or more errors occured. --->
Parquet.ParquetException: encoding RLE_DICTIONARY is not supported.
at Parquet.File.DataColumnReader.ReadColumn(BinaryReader reader ... in
DataColumnReader.cs: line 259
After further checking I found that it seems the problem seems to relate to a
default parquet version change.
When I use pyarrow 9 and configure version to 1.0 it works again from the
windows tool - when its 2.4 its not working (or supported in the windows tool).
df.to_parquet(r'C:\temp\test_10.parquet', version='1.0')
df.to_parquet(r'C:\temp\test_24.parquet', version='2.4')
Question might be if such a default change is a bug or a feature.
was:
Parquet files cannot be opened in Windows Parquet Viewer when stored with Arrow
Version 9.0.0. It worked when stored with version 8 and earlier.
Windows Parquet Viewer: 2.3.5 and 2.3.6
pyarrow version: 9.0.0
Error: System.AggregateException: One or more errors occured. --->
Parquet.ParquetException: encoding RLE_DICTIONARY is not supported.
at Parquet.File.DataColumnReader.ReadColumn(BinaryReader reader ... in
DataColumnReader.cs: line 259
> Parquet files cannot be opened in Windows Parquet Viewer when stored with
> Arrow Version 9.0.0
> ---------------------------------------------------------------------------------------------
>
> Key: ARROW-17352
> URL: https://issues.apache.org/jira/browse/ARROW-17352
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet
> Affects Versions: 9.0.0
> Environment: Windows10
> Reporter: Oliver Klein
> Priority: Critical
> Attachments: arrow9error.PNG
>
>
> Parquet files cannot be opened in Windows Parquet Viewer when stored with
> Arrow Version 9.0.0. It worked when stored with version 8 and earlier.
> Windows Parquet Viewer: 2.3.5 and 2.3.6
> pyarrow version: 9.0.0
> Error: System.AggregateException: One or more errors occured. --->
> Parquet.ParquetException: encoding RLE_DICTIONARY is not supported.
> at Parquet.File.DataColumnReader.ReadColumn(BinaryReader reader ... in
> DataColumnReader.cs: line 259
>
> After further checking I found that it seems the problem seems to relate to a
> default parquet version change.
> When I use pyarrow 9 and configure version to 1.0 it works again from the
> windows tool - when its 2.4 its not working (or supported in the windows
> tool).
> df.to_parquet(r'C:\temp\test_10.parquet', version='1.0')
> df.to_parquet(r'C:\temp\test_24.parquet', version='2.4')
> Question might be if such a default change is a bug or a feature.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)