[ https://issues.apache.org/jira/browse/ARROW-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887676#comment-16887676 ]
H. Vetinari edited comment on ARROW-5965 at 7/18/19 6:35 AM: ------------------------------------------------------------- [~wesmckinn] Thanks for the tips. Unfortunately, I can't follow that example because the code does not generate a core-dump but only prints "Killed". I found some ways to run it in gdb that *should* work (best as I can tell), like {{gdb -ex r --args python fail.py}} or interactively: {{gdb python}} {{(gdb) run fail.py}} but I always get: {{[...]}} {{warning: Could not trace the inferior process}} {{Error:}} {{warning: ptrace: Operation not permitted}} {{During startup program exited with code 127.}} Not sure if that's a mistake on my side or something in the setup/interplay of conda-gdb. was (Author: h-vetinari): [~wesmckinn] Thanks for the tips. Unfortunately, I can't follow that example because the code does not generate a core-dump but only prints "Killed". I found some ways to run it in gdb that *should* work (best as I can tell), like "gdb -ex r --args python fail.py" or interactively: " gdb python (gdb) run fail.py " but I always get: " [...] warning: Could not trace the inferior process Error: warning: ptrace: Operation not permitted During startup program exited with code 127. " Not sure if that's a mistake on my side or something in the setup/interplay of conda-gdb. > [Python] Regression: segfault when reading hive table with v0.14 > ---------------------------------------------------------------- > > Key: ARROW-5965 > URL: https://issues.apache.org/jira/browse/ARROW-5965 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.0 > Reporter: H. Vetinari > Priority: Critical > Labels: parquet > > I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow > installed in a conda env. > The data I'm reading is a hive(-registered) table written as parquet, and > with v0.13, reading this table (that is partitioned) does not cause any > issues. > The code that worked before and now crashes with v0.14 is simply: > ``` > import pyarrow.parquet as pq > pq.ParquetDataset('hdfs:///data/raw/source/table').read() > ``` > Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I > cannot report much more, but this is a pretty severe usability restriction. > So far the solution is to enforce `pyarrow<0.14` -- This message was sent by Atlassian JIRA (v7.6.14#76016)