jorisvandenbossche commented on PR #40807:
URL: https://github.com/apache/arrow/pull/40807#issuecomment-2088492861
The Python failures seems definitely related. Fetching this PR in my local
dev setup, I see the same segfault running the python tests. GDB backtrace:
```
$ gdb --args python -m pytest
python/pyarrow/tests/test_extension_type.py::test_parquet_extension_with_nested_storage
...
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
arrow::ArrayData::device_type (this=0x0) at
/home/joris/scipy/repos/arrow/cpp/src/arrow/array/data.cc:234
234 for (const auto& buf : buffers) {
(gdb) bt
#0 arrow::ArrayData::device_type (this=0x0) at
/home/joris/scipy/repos/arrow/cpp/src/arrow/array/data.cc:234
#1 0x00007ffff4a51b60 in arrow::ArrayData::device_type (this=0x555556d4e6c0)
at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:1665
#2 0x00007ffff4879610 in arrow::Array::device_type (this=<optimized out>)
at /home/joris/scipy/repos/arrow/cpp/src/arrow/array/array_base.h:227
#3 arrow::SimpleRecordBatch::SimpleRecordBatch (this=0x555556ea58b0,
schema=..., num_rows=<optimized out>, columns=..., sync_event=...)
at /home/joris/scipy/repos/arrow/cpp/src/arrow/record_batch.cc:69
#4 0x00007ffff4890323 in std::_Construct<arrow::SimpleRecordBatch,
std::shared_ptr<arrow::Schema>, long&,
std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > (__p=<optimized out>)
at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h:119
#5 std::allocator_traits<std::allocator<void>
>::construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&,
std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > (
__p=<optimized out>) at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/alloc_traits.h:635
#6 std::_Sp_counted_ptr_inplace<arrow::SimpleRecordBatch,
std::allocator<void>,
(__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<arrow::Schema>,
long&, std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > (__a=..., this=<optimized out>)
at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:604
#7
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::SimpleRecordBatch,
std::allocator<void>, std::shared_ptr<arrow::Schema>, long&,
std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > (__a=..., __p=<optimized out>,
this=<optimized out>)
at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:971
#8 std::__shared_ptr<arrow::SimpleRecordBatch,
(__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>,
std::shared_ptr<arrow::Schema>, long&,
std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > (__tag=..., this=<optimized out>)
at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:1712
#9
std::shared_ptr<arrow::SimpleRecordBatch>::shared_ptr<std::allocator<void>,
std::shared_ptr<arrow::Schema>, long&,
std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > (__tag=...,
this=<optimized out>) at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr.h:464
#10 std::make_shared<arrow::SimpleRecordBatch,
std::shared_ptr<arrow::Schema>, long&,
std::vector<std::shared_ptr<arrow::Array>,
std::allocator<std::shared_ptr<arrow::Array> > >,
std::shared_ptr<arrow::Device::SyncEvent> > ()
at
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr.h:1010
#11 arrow::RecordBatch::Make (schema=..., num_rows=num_rows@entry=0,
columns=..., sync_event=...)
at /home/joris/scipy/repos/arrow/cpp/src/arrow/record_batch.cc:217
#12 0x00007fff48dcb9ae in arrow::dataset::(anonymous
namespace)::FragmentToBatches (fragment=..., options=...)
at /home/joris/scipy/repos/arrow/cpp/src/arrow/dataset/scanner.cc:316
#13 0x00007fff48dcc364 in operator() (fragment=..., __closure=<optimized
out>)
at /home/joris/scipy/repos/arrow/cpp/src/arrow/dataset/scanner.cc:333
```
But not directly understanding why it is crashing on this. The problem seems
to be that the ArrayData itself is a nullptr.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]