AlenkaF commented on issue #34539:
URL: https://github.com/apache/arrow/issues/34539#issuecomment-1476007043

   I can add to the list of environments: macOS Monterey 12.6.3 with Python 
3.10.10 (with Docker or without).
   
   In my case I get segfault no matter of how I define 
`min/max_rows_per_group`. With `min_rows_per_group=10_000` and 
`max_rows_per_group=10_000` lldb backtrace is:
   ```
   Process 65402 stopped
   * thread #12, stop reason = EXC_BAD_ACCESS (code=2, address=0x1702efee8)
       frame #0: 0x000000011805a378 
libarrow_dataset.1200.0.0.dylib`arrow::dataset::internal::DatasetWriter::DatasetWriterImpl::DoWriteRecordBatch(this=0x00000001702f0110,
 batch=<unavailable>, directory="", prefix="") at dataset_writer.cc:578
      575    }
      576 
      577    Future<> DoWriteRecordBatch(std::shared_ptr<RecordBatch> batch,
   -> 578                                const std::string& directory, const 
std::string& prefix) {
      579      ARROW_ASSIGN_OR_RAISE(
      580          auto dir_queue_itr,
      581          ::arrow::internal::GetOrInsertGenerated(
   Target 0: (Python) stopped.
   ```
   
   Without `min_rows_per_group` or `max_rows_per_group` set the backtrace is 
very long:
   ```
   Process 75137 stopped
   * thread #25, stop reason = EXC_BAD_ACCESS (code=2, address=0x170a0bfe0)
       frame #0: 0x00000001998704fc 
libsystem_malloc.dylib`tiny_malloc_should_clear + 8
   libsystem_malloc.dylib`tiny_malloc_should_clear:
   ->  0x1998704fc <+8>:  stp    x28, x27, [sp, #0x60]
       0x199870500 <+12>: stp    x26, x25, [sp, #0x70]
       0x199870504 <+16>: stp    x24, x23, [sp, #0x80]
       0x199870508 <+20>: stp    x22, x21, [sp, #0x90]
   Target 0: (Python) stopped.
   (lldb) bt 5
   * thread #25, stop reason = EXC_BAD_ACCESS (code=2, address=0x170a0bfe0)
     * frame #0: 0x00000001998704fc 
libsystem_malloc.dylib`tiny_malloc_should_clear + 8
       frame #1: 0x000000019986f3a0 
libsystem_malloc.dylib`szone_malloc_should_clear + 92
       frame #2: 0x000000019988b748 libsystem_malloc.dylib`_malloc_zone_malloc 
+ 156
       frame #3: 0x0000000199a1c8b0 libc++abi.dylib`operator new(unsigned long) 
+ 32
       frame #4: 0x000000010272a1d4 lib.cpython-310-darwin.so`void* 
std::__1::__libcpp_operator_new<unsigned long>(__args=16) at new:235:10
   (lldb) bt 15
   * thread #25, stop reason = EXC_BAD_ACCESS (code=2, address=0x170a0bfe0)
     * frame #0: 0x00000001998704fc 
libsystem_malloc.dylib`tiny_malloc_should_clear + 8
       frame #1: 0x000000019986f3a0 
libsystem_malloc.dylib`szone_malloc_should_clear + 92
       frame #2: 0x000000019988b748 libsystem_malloc.dylib`_malloc_zone_malloc 
+ 156
       frame #3: 0x0000000199a1c8b0 libc++abi.dylib`operator new(unsigned long) 
+ 32
       frame #4: 0x000000010272a1d4 lib.cpython-310-darwin.so`void* 
std::__1::__libcpp_operator_new<unsigned long>(__args=16) at new:235:10
       frame #5: 0x000000010272a130 
lib.cpython-310-darwin.so`std::__1::__libcpp_allocate(__size=16, __align=8) at 
new:261:10
       frame #6: 0x0000000102833a00 
lib.cpython-310-darwin.so`std::__1::allocator<std::__1::shared_ptr<arrow::Array>
 >::allocate(this=0x0000000170a0c278, __n=1) at allocator.h:108:38
       frame #7: 0x0000000102833860 
lib.cpython-310-darwin.so`std::__1::allocator_traits<std::__1::allocator<std::__1::shared_ptr<arrow::Array>
 > >::allocate(__a=0x0000000170a0c278, __n=1) at allocator_traits.h:262:20
       frame #8: 0x0000000102833378 
lib.cpython-310-darwin.so`std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > 
>::__vallocate(this=0x0000000170a0c268 size=0, __n=1) at vector:1015:37
       frame #9: 0x0000000126e90fa0 
libarrow.1200.0.0.dylib`std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > 
>::vector(this=0x0000000170a0c268 size=0, __x=size=1) at vector:1280:9
       frame #10: 0x0000000126e90f30 
libarrow.1200.0.0.dylib`std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > 
>::vector(this=0x0000000170a0c268 size=0, __x=size=1) at vector:1273:1
       frame #11: 0x00000001271b3b14 
libarrow.1200.0.0.dylib`std::__1::__shared_ptr_emplace<arrow::ChunkedArray, 
std::__1::allocator<arrow::ChunkedArray> 
>::__shared_ptr_emplace<std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&, 
std::__1::shared_ptr<arrow::DataType> const&>(this=0x000000014b883640, 
__a=allocator<arrow::ChunkedArray> @ 0x0000000170a0c2af, __args=size=1, 
__args=std::__1::shared_ptr<arrow::DataType>::element_type @ 0x0000000103516c48 
strong=5977 weak=2) at shared_ptr.h:298:41
       frame #12: 0x00000001271b3a80 
libarrow.1200.0.0.dylib`std::__1::__shared_ptr_emplace<arrow::ChunkedArray, 
std::__1::allocator<arrow::ChunkedArray> 
>::__shared_ptr_emplace<std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&, 
std::__1::shared_ptr<arrow::DataType> const&>(this=0x000000014b883640, 
__a=allocator<arrow::ChunkedArray> @ 0x0000000170a0c2ef, __args=size=1, 
__args=std::__1::shared_ptr<arrow::DataType>::element_type @ 0x0000000103516c48 
strong=5977 weak=2) at shared_ptr.h:292:5
       frame #13: 0x00000001271b39c0 
libarrow.1200.0.0.dylib`std::__1::shared_ptr<arrow::ChunkedArray> 
std::__1::allocate_shared<arrow::ChunkedArray, 
std::__1::allocator<arrow::ChunkedArray>, 
std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&, 
std::__1::shared_ptr<arrow::DataType> const&, void>(__a=0x0000000170a0c3c7, 
__args=size=1, __args=std::__1::shared_ptr<arrow::DataType>::element_type @ 
0x0000000103516c48 strong=5977 weak=2) at shared_ptr.h:1106:55
       frame #14: 0x00000001271a0920 
libarrow.1200.0.0.dylib`std::__1::shared_ptr<arrow::ChunkedArray> 
std::__1::make_shared<arrow::ChunkedArray, 
std::__1::vector<std::__1::shared_ptr<arrow::Array>, 
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&, 
std::__1::shared_ptr<arrow::DataType> const&, void>(__args=size=1, 
__args=std::__1::shared_ptr<arrow::DataType>::element_type @ 0x0000000103516c48 
strong=5977 weak=2) at shared_ptr.h:1115:12
   ```
   
   And I had to change `TOTAL = 2**10` to not get a segfault:
   ```
   <pyarrow._parquet.FileMetaData object at 0x12b79c270>
     created_by: parquet-cpp-arrow version 12.0.0-SNAPSHOT
     num_columns: 1
     num_rows: 1024
     num_row_groups: 1024
     format_version: 1.0
     serialized_size: 97276
   Process 75050 exited with status = 0 (0x00000000) 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to