[ https://issues.apache.org/jira/browse/ARROW-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887129#comment-16887129 ]
H. Vetinari commented on ARROW-5965: ------------------------------------ Hey Neal, I tried a couple of times before filing the report, and all (~5) invocations on 0.14 crashed, and all invocations on 0.13 worked. The machine itself has lots of memory, so I don't think it's that. Not sure I'll be able to pare this down to a minimal reproducing parquet file. I'll try. > [Python] Regression: segfault when reading hive table with v0.14 > ---------------------------------------------------------------- > > Key: ARROW-5965 > URL: https://issues.apache.org/jira/browse/ARROW-5965 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.0 > Reporter: H. Vetinari > Priority: Critical > Labels: parquet > > I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow > installed in a conda env. > The data I'm reading is a hive(-registered) table written as parquet, and > with v0.13, reading this table (that is partitioned) does not cause any > issues. > The code that worked before and now crashes with v0.14 is simply: > ``` > import pyarrow.parquet as pq > pq.ParquetDataset('hdfs:///data/raw/source/table').read() > ``` > Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I > cannot report much more, but this is a pretty severe usability restriction. > So far the solution is to enforce `pyarrow<0.14` -- This message was sent by Atlassian JIRA (v7.6.14#76016)