Jay Baywatch created ARROW-12547:
------------------------------------
Summary: Sigbus when using mmap in multiprocessing env over netapp
Key: ARROW-12547
URL: https://issues.apache.org/jira/browse/ARROW-12547
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 3.0.0
Reporter: Jay Baywatch
We have noticed a condition where using arrow to read parquet files that reside
on our netapp from slurm (over python) raise an occasional signal 7.
We haven’t yet tried disabling memory mapping yet, although we do expect that
turning memory mapping off in read_table will resolve the issue.
This seems to occur when we read a file that has just been written, even though
we do write parquet files to a transient location and then swap the file in
using os.rename
All that said, we were not sure if this was known issue or if team pyarrow is
interested in the stack trace.
The relevant bits are:
Thread 1 (Thread 0x7fafa7dff700 (LWP 44408)):
#0 __memcpy_avx_unaligned () at
../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:238
#1 0x00007fafb9c40aba in snappy::RawUncompress(snappy::Source*, char*) () from
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libarrow.so*.300
#2 0x00007fafb9c41131 in snappy::RawUncompress(char const*, unsigned long,
char*) () from
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300
#3 0x00007fafb942abbe in arrow::util::internal::(anonymous
namespace)::SnappyCodec::Decompress(long, unsigned char const*, long, unsigned
char*) () from
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300
#4 0x00007fafb4d0965e in parquet::(anonymous
namespace)::SerializedPageReader::DecompressIfNeeded(std::shared_ptr<arrow::Buffer>,
int, int, int) () from
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libparquet.so*.300
--
This message was sent by Atlassian Jira
(v8.3.4#803005)