AlenkaF commented on issue #34539:
URL: https://github.com/apache/arrow/issues/34539#issuecomment-1476007043
I can add to the list of environments: macOS Monterey 12.6.3 with Python
3.10.10 (with Docker or without).
In my case I get segfault no matter of how I define
`min/max_rows_per_group`. With `min_rows_per_group=10_000` and
`max_rows_per_group=10_000` lldb backtrace is:
```
Process 65402 stopped
* thread #12, stop reason = EXC_BAD_ACCESS (code=2, address=0x1702efee8)
frame #0: 0x000000011805a378
libarrow_dataset.1200.0.0.dylib`arrow::dataset::internal::DatasetWriter::DatasetWriterImpl::DoWriteRecordBatch(this=0x00000001702f0110,
batch=<unavailable>, directory="", prefix="") at dataset_writer.cc:578
575 }
576
577 Future<> DoWriteRecordBatch(std::shared_ptr<RecordBatch> batch,
-> 578 const std::string& directory, const
std::string& prefix) {
579 ARROW_ASSIGN_OR_RAISE(
580 auto dir_queue_itr,
581 ::arrow::internal::GetOrInsertGenerated(
Target 0: (Python) stopped.
```
Without `min_rows_per_group` or `max_rows_per_group` set the backtrace is
very long:
```
Process 75137 stopped
* thread #25, stop reason = EXC_BAD_ACCESS (code=2, address=0x170a0bfe0)
frame #0: 0x00000001998704fc
libsystem_malloc.dylib`tiny_malloc_should_clear + 8
libsystem_malloc.dylib`tiny_malloc_should_clear:
-> 0x1998704fc <+8>: stp x28, x27, [sp, #0x60]
0x199870500 <+12>: stp x26, x25, [sp, #0x70]
0x199870504 <+16>: stp x24, x23, [sp, #0x80]
0x199870508 <+20>: stp x22, x21, [sp, #0x90]
Target 0: (Python) stopped.
(lldb) bt 5
* thread #25, stop reason = EXC_BAD_ACCESS (code=2, address=0x170a0bfe0)
* frame #0: 0x00000001998704fc
libsystem_malloc.dylib`tiny_malloc_should_clear + 8
frame #1: 0x000000019986f3a0
libsystem_malloc.dylib`szone_malloc_should_clear + 92
frame #2: 0x000000019988b748 libsystem_malloc.dylib`_malloc_zone_malloc
+ 156
frame #3: 0x0000000199a1c8b0 libc++abi.dylib`operator new(unsigned long)
+ 32
frame #4: 0x000000010272a1d4 lib.cpython-310-darwin.so`void*
std::__1::__libcpp_operator_new<unsigned long>(__args=16) at new:235:10
(lldb) bt 15
* thread #25, stop reason = EXC_BAD_ACCESS (code=2, address=0x170a0bfe0)
* frame #0: 0x00000001998704fc
libsystem_malloc.dylib`tiny_malloc_should_clear + 8
frame #1: 0x000000019986f3a0
libsystem_malloc.dylib`szone_malloc_should_clear + 92
frame #2: 0x000000019988b748 libsystem_malloc.dylib`_malloc_zone_malloc
+ 156
frame #3: 0x0000000199a1c8b0 libc++abi.dylib`operator new(unsigned long)
+ 32
frame #4: 0x000000010272a1d4 lib.cpython-310-darwin.so`void*
std::__1::__libcpp_operator_new<unsigned long>(__args=16) at new:235:10
frame #5: 0x000000010272a130
lib.cpython-310-darwin.so`std::__1::__libcpp_allocate(__size=16, __align=8) at
new:261:10
frame #6: 0x0000000102833a00
lib.cpython-310-darwin.so`std::__1::allocator<std::__1::shared_ptr<arrow::Array>
>::allocate(this=0x0000000170a0c278, __n=1) at allocator.h:108:38
frame #7: 0x0000000102833860
lib.cpython-310-darwin.so`std::__1::allocator_traits<std::__1::allocator<std::__1::shared_ptr<arrow::Array>
> >::allocate(__a=0x0000000170a0c278, __n=1) at allocator_traits.h:262:20
frame #8: 0x0000000102833378
lib.cpython-310-darwin.so`std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> >
>::__vallocate(this=0x0000000170a0c268 size=0, __n=1) at vector:1015:37
frame #9: 0x0000000126e90fa0
libarrow.1200.0.0.dylib`std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> >
>::vector(this=0x0000000170a0c268 size=0, __x=size=1) at vector:1280:9
frame #10: 0x0000000126e90f30
libarrow.1200.0.0.dylib`std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> >
>::vector(this=0x0000000170a0c268 size=0, __x=size=1) at vector:1273:1
frame #11: 0x00000001271b3b14
libarrow.1200.0.0.dylib`std::__1::__shared_ptr_emplace<arrow::ChunkedArray,
std::__1::allocator<arrow::ChunkedArray>
>::__shared_ptr_emplace<std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&,
std::__1::shared_ptr<arrow::DataType> const&>(this=0x000000014b883640,
__a=allocator<arrow::ChunkedArray> @ 0x0000000170a0c2af, __args=size=1,
__args=std::__1::shared_ptr<arrow::DataType>::element_type @ 0x0000000103516c48
strong=5977 weak=2) at shared_ptr.h:298:41
frame #12: 0x00000001271b3a80
libarrow.1200.0.0.dylib`std::__1::__shared_ptr_emplace<arrow::ChunkedArray,
std::__1::allocator<arrow::ChunkedArray>
>::__shared_ptr_emplace<std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&,
std::__1::shared_ptr<arrow::DataType> const&>(this=0x000000014b883640,
__a=allocator<arrow::ChunkedArray> @ 0x0000000170a0c2ef, __args=size=1,
__args=std::__1::shared_ptr<arrow::DataType>::element_type @ 0x0000000103516c48
strong=5977 weak=2) at shared_ptr.h:292:5
frame #13: 0x00000001271b39c0
libarrow.1200.0.0.dylib`std::__1::shared_ptr<arrow::ChunkedArray>
std::__1::allocate_shared<arrow::ChunkedArray,
std::__1::allocator<arrow::ChunkedArray>,
std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&,
std::__1::shared_ptr<arrow::DataType> const&, void>(__a=0x0000000170a0c3c7,
__args=size=1, __args=std::__1::shared_ptr<arrow::DataType>::element_type @
0x0000000103516c48 strong=5977 weak=2) at shared_ptr.h:1106:55
frame #14: 0x00000001271a0920
libarrow.1200.0.0.dylib`std::__1::shared_ptr<arrow::ChunkedArray>
std::__1::make_shared<arrow::ChunkedArray,
std::__1::vector<std::__1::shared_ptr<arrow::Array>,
std::__1::allocator<std::__1::shared_ptr<arrow::Array> > >&,
std::__1::shared_ptr<arrow::DataType> const&, void>(__args=size=1,
__args=std::__1::shared_ptr<arrow::DataType>::element_type @ 0x0000000103516c48
strong=5977 weak=2) at shared_ptr.h:1115:12
```
And I had to change `TOTAL = 2**10` to not get a segfault:
```
<pyarrow._parquet.FileMetaData object at 0x12b79c270>
created_by: parquet-cpp-arrow version 12.0.0-SNAPSHOT
num_columns: 1
num_rows: 1024
num_row_groups: 1024
format_version: 1.0
serialized_size: 97276
Process 75050 exited with status = 0 (0x00000000)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]