jorisvandenbossche commented on PR #40807:
URL: https://github.com/apache/arrow/pull/40807#issuecomment-2088492861

   The Python failures seems definitely related. Fetching this PR in my local 
dev setup, I see the same segfault running the python tests. GDB backtrace:
   
   ```
   $ gdb --args python -m pytest 
python/pyarrow/tests/test_extension_type.py::test_parquet_extension_with_nested_storage
   ...
   Thread 1 "python" received signal SIGSEGV, Segmentation fault.
   arrow::ArrayData::device_type (this=0x0) at 
/home/joris/scipy/repos/arrow/cpp/src/arrow/array/data.cc:234
   234    for (const auto& buf : buffers) {
   (gdb) bt
   #0  arrow::ArrayData::device_type (this=0x0) at 
/home/joris/scipy/repos/arrow/cpp/src/arrow/array/data.cc:234
   #1  0x00007ffff4a51b60 in arrow::ArrayData::device_type (this=0x555556d4e6c0)
       at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:1665
   #2  0x00007ffff4879610 in arrow::Array::device_type (this=<optimized out>)
       at /home/joris/scipy/repos/arrow/cpp/src/arrow/array/array_base.h:227
   #3  arrow::SimpleRecordBatch::SimpleRecordBatch (this=0x555556ea58b0, 
schema=..., num_rows=<optimized out>, columns=..., sync_event=...)
       at /home/joris/scipy/repos/arrow/cpp/src/arrow/record_batch.cc:69
   #4  0x00007ffff4890323 in std::_Construct<arrow::SimpleRecordBatch, 
std::shared_ptr<arrow::Schema>, long&, 
std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > (__p=<optimized out>)
       at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h:119
   #5  std::allocator_traits<std::allocator<void> 
>::construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, 
std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > (
       __p=<optimized out>) at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/alloc_traits.h:635
   #6  std::_Sp_counted_ptr_inplace<arrow::SimpleRecordBatch, 
std::allocator<void>, 
(__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<arrow::Schema>,
 long&, std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > (__a=..., this=<optimized out>)
       at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:604
   #7  
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::SimpleRecordBatch,
 std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, 
std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > (__a=..., __p=<optimized out>, 
this=<optimized out>)
       at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:971
   #8  std::__shared_ptr<arrow::SimpleRecordBatch, 
(__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, 
std::shared_ptr<arrow::Schema>, long&, 
std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > (__tag=..., this=<optimized out>)
       at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr_base.h:1712
   #9  
std::shared_ptr<arrow::SimpleRecordBatch>::shared_ptr<std::allocator<void>, 
std::shared_ptr<arrow::Schema>, long&, 
std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > (__tag=..., 
       this=<optimized out>) at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr.h:464
   #10 std::make_shared<arrow::SimpleRecordBatch, 
std::shared_ptr<arrow::Schema>, long&, 
std::vector<std::shared_ptr<arrow::Array>, 
std::allocator<std::shared_ptr<arrow::Array> > >, 
std::shared_ptr<arrow::Device::SyncEvent> > ()
       at 
/home/joris/conda/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/shared_ptr.h:1010
   #11 arrow::RecordBatch::Make (schema=..., num_rows=num_rows@entry=0, 
columns=..., sync_event=...)
       at /home/joris/scipy/repos/arrow/cpp/src/arrow/record_batch.cc:217
   #12 0x00007fff48dcb9ae in arrow::dataset::(anonymous 
namespace)::FragmentToBatches (fragment=..., options=...)
       at /home/joris/scipy/repos/arrow/cpp/src/arrow/dataset/scanner.cc:316
   #13 0x00007fff48dcc364 in operator() (fragment=..., __closure=<optimized 
out>)
       at /home/joris/scipy/repos/arrow/cpp/src/arrow/dataset/scanner.cc:333
   ```
   
   But not directly understanding why it is crashing on this. The problem seems 
to be that the ArrayData itself is a nullptr. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to