[
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064816#comment-17064816
]
Marc Bernot commented on ARROW-7939:
------------------------------------
Indeed I tried this with macOS and this did not give any error. This seems
Windows specific.
I checked if the generated foo.parquet files were exactly the same on Windows
and macOS and they are.
I experienced this on two separate windows environments.
* Windows 10 family 64bits, intel Atom x5-Z8500 CPU @1,44GHz
As you did, I started from a clean miniconda environment then \{{ conda create
-c conda-forge -n arrow-3x016 python=3.x pyarrow=0.16}} with x=6,7,8
** all these python versions result in the same crash
* Windows 7, 64bits, HPZ820 workstation, intel Xeon CPU E5-2630
** here I tested only python=3.6.9
Would other informations on my environment would be useful?
> [Python] crashes when reading parquet file compressed with snappy
> -----------------------------------------------------------------
>
> Key: ARROW-7939
> URL: https://issues.apache.org/jira/browse/ARROW-7939
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.16.0
> Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
> Reporter: Marc Bernot
> Priority: Major
> Fix For: 0.17.0
>
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be
> read back. The example below works fine with arrays_ok but python crashes
> with arrays_nok (and as soon as they are at least three different values
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)