Jay Baywatch created ARROW-12547:
------------------------------------

             Summary: Sigbus when using mmap in multiprocessing env over netapp
                 Key: ARROW-12547
                 URL: https://issues.apache.org/jira/browse/ARROW-12547
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 3.0.0
            Reporter: Jay Baywatch


We have noticed a condition where using arrow to read parquet files that reside 
on our netapp from slurm (over python) raise an occasional signal 7.

We haven’t yet tried disabling memory mapping yet, although we do expect that 
turning memory mapping off in read_table will resolve the issue.

This seems to occur when we read a file that has just been written, even though 
we do write parquet files to a transient location and then swap the file in 
using os.rename

 

All that said, we were not sure if this was known issue or if team pyarrow is 
interested in the stack trace.
 
 The relevant bits are:

Thread 1 (Thread 0x7fafa7dff700 (LWP 44408)):

#0  __memcpy_avx_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:238

#1  0x00007fafb9c40aba in snappy::RawUncompress(snappy::Source*, char*) () from 
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libarrow.so*.300

#2  0x00007fafb9c41131 in snappy::RawUncompress(char const*, unsigned long, 
char*) () from 
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300

#3  0x00007fafb942abbe in arrow::util::internal::(anonymous 
namespace)::SnappyCodec::Decompress(long, unsigned char const*, long, unsigned 
char*) () from 
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300

#4  0x00007fafb4d0965e in parquet::(anonymous 
namespace)::SerializedPageReader::DecompressIfNeeded(std::shared_ptr<arrow::Buffer>,
 int, int, int) () from 
/home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libparquet.so*.300



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to