jorisvandenbossche commented on issue #39562:
URL: https://github.com/apache/arrow/issues/39562#issuecomment-1888701463

   Ignore the above lldb output, that is useless because of 
https://github.com/apache/arrow/issues/37589. Thanks to the workaround 
mentioned in 
https://stackoverflow.com/questions/74059978/why-is-lldb-generating-exc-bad-instruction-with-user-compiled-library-on-macos/76032052#76032052
 (`settings set platform.plugin.darwin.ignored-exceptions 
EXC_BAD_INSTRUCTION`), I could get an actual backtrace:
   
   ```
   (lldb) process launch
   Process 2066 launched: 
'/opt/homebrew/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/Resources/Python.app/Contents/MacOS/Python'
 (arm64)
   libc++abi: terminating due to uncaught exception of type std::length_error: 
vector
   Process 2066 stopped
   * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
       frame #0: 0x00000001a9b30744 libsystem_kernel.dylib`__pthread_kill + 8
   libsystem_kernel.dylib`:
   ->  0x1a9b30744 <+8>:  b.lo   0x1a9b30764               ; <+40>
       0x1a9b30748 <+12>: pacibsp 
       0x1a9b3074c <+16>: stp    x29, x30, [sp, #-0x10]!
       0x1a9b30750 <+20>: mov    x29, sp
   Target 0: (Python) stopped.
   (lldb) bt
   * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
     * frame #0: 0x00000001a9b30744 libsystem_kernel.dylib`__pthread_kill + 8
       frame #1: 0x00000001a9b67c28 libsystem_pthread.dylib`pthread_kill + 288
       frame #2: 0x00000001a9a75ae8 libsystem_c.dylib`abort + 180
       frame #3: 0x00000001a9b20b84 libc++abi.dylib`abort_message + 132
       frame #4: 0x00000001a9b103b4 
libc++abi.dylib`demangling_terminate_handler() + 320
       frame #5: 0x00000001a97e6e68 libobjc.A.dylib`_objc_terminate() + 160
       frame #6: 0x00000001a9b1ff48 libc++abi.dylib`std::__terminate(void 
(*)()) + 16
       frame #7: 0x00000001a9b22d34 
libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 36
       frame #8: 0x00000001a9b22ce0 libc++abi.dylib`__cxa_throw + 140
       frame #9: 0x0000000148022f90 
libarrow_dataset.1500.dylib`std::__1::__throw_length_error[abi:v160006](char 
const*) + 60
       frame #10: 0x00000001480635f8 
libarrow_dataset.1500.dylib`std::__1::vector<bool, 
std::__1::allocator<bool>>::__throw_length_error[abi:v160006]() const + 20
       frame #11: 0x000000014801d4f8 
libarrow_dataset.1500.dylib`std::__1::vector<bool, 
std::__1::allocator<bool>>::resize(unsigned long, bool) + 600
       frame #12: 0x000000014801d14c 
libarrow_dataset.1500.dylib`arrow::dataset::ParquetFileFragment::SetMetadata(std::__1::shared_ptr<parquet::FileMetaData>,
 std::__1::shared_ptr<parquet::arrow::SchemaManifest>) + 432
       frame #13: 0x000000014801d7e4 
libarrow_dataset.1500.dylib`arrow::dataset::ParquetFileFragment::SplitByRowGroup(arrow::compute::Expression)
 + 720
       frame #14: 0x000000010763b824 
_dataset_parquet.cpython-310-darwin.so`__pyx_pw_7pyarrow_16_dataset_parquet_19ParquetFileFragment_5split_by_row_group(_object*,
 _object* const*, long, _object*) + 1428
   ```
   
   So it is giving a `terminating due to uncaught exception of type 
std::length_error: vector` error for the vector resize in 
`ParquetFileFragment::SetMetadata`, presumably the one that I changed in 
https://github.com/apache/arrow/pull/39065:
   
   ```diff
   -    statistics_expressions_complete_.resize(physical_schema_->num_fields(), 
false);
   +    
statistics_expressions_complete_.resize(manifest_->descr->num_columns(), false);
   ```
   
   I am wondering if sometimes `manifest_->descr->num_columns()` could be 
undefined?  
   The crash also happens specifically in a test where there dataset is created 
with `ParquetDatasetFactory`
   
   It's still very strange that this only occurs in the MacOS wheels. I found 
some potentially similar issue 
(https://github.com/pyg-team/pytorch_geometric/issues/4419), but also without 
clear solution (guess that it was related with inference of system libraries, 
was typically solved by using a (virtual) environment)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to