[jira] [Commented] (ARROW-7939) [Python] crashes when reading parquet file compressed with snappy

Michael Peleshenko (Jira) Tue, 12 Jan 2021 09:33:27 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263542#comment-17263542
 ]


Michael Peleshenko commented on ARROW-7939:
-------------------------------------------

We seem to be facing the same crash with the sample code here, but with the 
pyarrow 2.0.0 pip wheel for Windows and Python 3.8.
On an Intel Xeon Silver 4114 CPU, I have no issues.
On an Intel Xeon E5-2620, my Python crashes.

According to 
https://ark.intel.com/content/www/us/en/ark/products/64594/intel-xeon-processor-e5-2620-15m-cache-2-00-ghz-7-20-gt-s-intel-qpi.html,
 the Xeon E5-2620, does not support AVX2, while the other one does, so I 
suspect we are hitting the same issue here.

Assuming this is the same issue, has this snappy fix been including in the 
pyarrow pip wheel build for Windows?

> [Python] crashes when reading parquet file compressed with snappy
> -----------------------------------------------------------------
>
>                 Key: ARROW-7939
>                 URL: https://issues.apache.org/jira/browse/ARROW-7939
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>         Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
>            Reporter: Marc Bernot
>            Assignee: Uwe Korn
>            Priority: Major
>             Fix For: 1.0.0
>
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be 
> read back. The example below works fine with arrays_ok but python crashes 
> with arrays_nok (and as soon as they are at least three different values 
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7939) [Python] crashes when reading parquet file compressed with snappy

Reply via email to