mrd0ll4r commented on issue #46814:
URL: https://github.com/apache/arrow/issues/46814#issuecomment-2976409266

   Oh nice, I did that:
   
   ```
   > open_dataset("data/bluesky/labeler_logs_dirty_parquet") %>%
     group_by(uri) %>%
     tally() %>%
     filter(n==1) %>%
     tally() %>%
     collect()
   [New Thread 0x7fffe8c216c0 (LWP 71326)]
   [New Thread 0x7fffe3fff6c0 (LWP 71327)]
   [New Thread 0x7fffe37fe6c0 (LWP 71328)]
   [New Thread 0x7fffe2ffd6c0 (LWP 71329)]
   [New Thread 0x7fffe27fc6c0 (LWP 71330)]
   [New Thread 0x7fffe1ffb6c0 (LWP 71331)]
   [New Thread 0x7fffe17fa6c0 (LWP 71332)]
   [New Thread 0x7fffe0ff96c0 (LWP 71333)]
   [New Thread 0x7fffc3fff6c0 (LWP 71334)]
   [New Thread 0x7fffc37fe6c0 (LWP 71335)]
   [New Thread 0x7fffc2ffd6c0 (LWP 71336)]
   [New Thread 0x7fffc27fc6c0 (LWP 71337)]
   [New Thread 0x7fffc1ffb6c0 (LWP 71338)]
   [New Thread 0x7fffc17fa6c0 (LWP 71339)]
   [New Thread 0x7fffc0ff96c0 (LWP 71340)]
   [New Thread 0x7fffa3fff6c0 (LWP 71341)]
   [New Thread 0x7fff9b7fe6c0 (LWP 71342)]
   [New Thread 0x7fffa37fe6c0 (LWP 71343)]
   [New Thread 0x7fffa2ffd6c0 (LWP 71344)]
   [New Thread 0x7fffa27fc6c0 (LWP 71345)]
   [New Thread 0x7fffa1ffb6c0 (LWP 71346)]
   [New Thread 0x7fffa17fa6c0 (LWP 71347)]
   [New Thread 0x7fffa0ff96c0 (LWP 71348)]
   [New Thread 0x7fff9bfff6c0 (LWP 71349)]
   [New Thread 0x7fff9affd6c0 (LWP 71350)]
   [New Thread 0x7fff9a7fc6c0 (LWP 71351)]
   [New Thread 0x7fff99ffb6c0 (LWP 71352)]
   [New Thread 0x7fff997fa6c0 (LWP 71353)]
   [New Thread 0x7fff98ff96c0 (LWP 71354)]
   [New Thread 0x7fff63fff6c0 (LWP 71355)]
   [New Thread 0x7fff5bfff6c0 (LWP 71356)]
   [New Thread 0x7fff637fe6c0 (LWP 71357)]
   [New Thread 0x7fff62ffd6c0 (LWP 71358)]
   [New Thread 0x7fff627fc6c0 (LWP 71359)]
   [New Thread 0x7fff61ffb6c0 (LWP 71360)]
   [New Thread 0x7fff617fa6c0 (LWP 71361)]
   [New Thread 0x7fff60ff96c0 (LWP 71362)]
   [New Thread 0x7fff5b7fe6c0 (LWP 71363)]
   [New Thread 0x7fff5affd6c0 (LWP 71364)]
   [New Thread 0x7fff5a7fc6c0 (LWP 71365)]
   [New Thread 0x7fff59ffb6c0 (LWP 71366)]
   [New Thread 0x7fff597fa6c0 (LWP 71367)]
   [New Thread 0x7fff58ff96c0 (LWP 71368)]
   [New Thread 0x7fff23fff6c0 (LWP 71369)]
   [New Thread 0x7fff1b7fe6c0 (LWP 71370)]
   [New Thread 0x7fff237fe6c0 (LWP 71373)]
   [New Thread 0x7fff22ffd6c0 (LWP 71374)]
   [New Thread 0x7fff227fc6c0 (LWP 71375)]
   [New Thread 0x7fff21ffb6c0 (LWP 71376)]
   [New Thread 0x7fff217fa6c0 (LWP 71377)]
   [New Thread 0x7fff20ff96c0 (LWP 71378)]
   [New Thread 0x7fff1bfff6c0 (LWP 71379)]
   [New Thread 0x7fff1affd6c0 (LWP 71380)]
   [New Thread 0x7fff1a7fc6c0 (LWP 71381)]
   [New Thread 0x7fff19ffb6c0 (LWP 71382)]
   [New Thread 0x7fff197fa6c0 (LWP 71383)]
   [New Thread 0x7fff18ff96c0 (LWP 71384)]
   [New Thread 0x7ffeebfff6c0 (LWP 71385)]
   [New Thread 0x7ffeeb7fe6c0 (LWP 71386)]
   [New Thread 0x7ffeeaffd6c0 (LWP 71387)]
   [New Thread 0x7ffeea7fc6c0 (LWP 71388)]
   [New Thread 0x7ffee9ffb6c0 (LWP 71389)]
   [New Thread 0x7ffee97fa6c0 (LWP 71390)]
   [New Thread 0x7ffee8ff96c0 (LWP 71391)]
   [New Thread 0x7ffec7fff6c0 (LWP 71392)]
   [New Thread 0x7ffec77fe6c0 (LWP 71393)]
   [New Thread 0x7ffec6ffd6c0 (LWP 71394)]
   [New Thread 0x7ffec67fc6c0 (LWP 71395)]
   [New Thread 0x7ffec5ffb6c0 (LWP 71396)]
   [New Thread 0x7ffec57fa6c0 (LWP 71397)]
   [New Thread 0x7ffec4ff96c0 (LWP 71398)]
   [New Thread 0x7ffea7fff6c0 (LWP 71399)]
   [New Thread 0x7ffea77fe6c0 (LWP 71400)]
   [New Thread 0x7ffea77fe6c0 (LWP 71400)]
   
   Thread 25 "R" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x7fffa17fa6c0 (LWP 71347)]
   __memcpy_evex_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:273
   273     ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such 
file or directory.
   (gdb) bt
   #0  __memcpy_evex_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:273
   #1  0x00007fffeb76cbfd in memcpy (__len=70, __src=<optimized out>, 
__dest=<optimized out>) at 
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
   #2  arrow::BufferBuilder::UnsafeAppend (length=70, data=<optimized out>, 
this=0x7fffa17f89c0) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/buffer_builder.h:143
   #3  arrow::TypedBufferBuilder<unsigned char, void>::UnsafeAppend 
(num_elements=70, values=<optimized out>, this=0x7fffa17f89c0)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/buffer_builder.h:268
   #4  arrow::compute::internal::(anonymous 
namespace)::BinaryFilterImpl<arrow::BinaryType> (ctx=<optimized out>, 
out=0x7fff7c036350, null_selection=arrow::compute::FilterOptions::DROP, 
output_length=26613,
       filter=..., values=...) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc:709
   #5  arrow::compute::internal::(anonymous namespace)::BinaryFilterExec 
(ctx=<optimized out>, batch=..., out=<optimized out>)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc:844
   #6  0x00007fffeb611232 in arrow::compute::detail::(anonymous 
namespace)::VectorExecutor::Exec (this=this@entry=0x7fff3401de00, span=..., 
listener=listener@entry=0x7fffa17f8dc0)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/exec.cc:1109
   #7  0x00007fffeb611a64 in arrow::compute::detail::(anonymous 
namespace)::VectorExecutor::Execute (this=0x7fff3401de00, batch=..., 
listener=0x7fffa17f8dc0)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/exec.cc:1049
   #8  0x00007fffeb63233d in 
arrow::compute::detail::FunctionExecutorImpl::Execute (this=0x7fffcdb6f880, 
args=std::vector of length 2, capacity 2 = {...}, passed_length=-1)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/function.cc:278
   #9  0x00007fffeb62ed56 in arrow::compute::(anonymous 
namespace)::ExecuteInternal (func=..., args=std::vector of length 2, capacity 2 
= {...}, passed_length=passed_length@entry=-1,
       options=options@entry=0x7fffa17f97c0, ctx=ctx@entry=0x7fffed4bb420 
<arrow::compute::default_exec_context()::default_ctx>)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/function.cc:343
   #10 0x00007fffeb62f297 in arrow::compute::Function::Execute 
(this=0x55555c117580, args=..., options=0x7fffa17f97c0, ctx=0x7fffed4bb420 
<arrow::compute::default_exec_context()::default_ctx>)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/function.cc:350
   #11 0x00007fffeb60dc41 in arrow::compute::CallFunction 
(func_name="array_filter", args=std::vector of length 2, capacity 2 = {...}, 
options=options@entry=0x7fffa17f97c0,
       ctx=ctx@entry=0x7fffed4bb420 
<arrow::compute::default_exec_context()::default_ctx>) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/exec.cc:1369
   #12 0x00007fffeb76e071 in arrow::compute::internal::(anonymous 
namespace)::FilterMetaFunction::ExecuteImpl (this=<optimized out>, 
args=std::vector of length 2, capacity 2 = {...}, options=0x7fffa17f97c0,
       ctx=0x7fffed4bb420 
<arrow::compute::default_exec_context()::default_ctx>) at 
/usr/include/c++/12/bits/basic_string.tcc:238
   #13 0x00007fffeb62dfa7 in arrow::compute::MetaFunction::Execute 
(this=0x55555910a0e0, args=std::vector of length 2, capacity 2 = {...}, 
options=0x7fffa17f97c0,
       ctx=0x7fffed4bb420 
<arrow::compute::default_exec_context()::default_ctx>) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/function.cc:483
   #14 0x00007fffeb60dc41 in arrow::compute::CallFunction (func_name="filter", 
args=std::vector of length 2, capacity 2 = {...}, 
options=options@entry=0x7fffa17f97c0,
       ctx=0x7fffed4bb420 
<arrow::compute::default_exec_context()::default_ctx>, ctx@entry=0x0) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/exec.cc:1369
   #15 0x00007fffeb5e8e77 in arrow::compute::Filter (values=..., filter=..., 
options=..., ctx=ctx@entry=0x0) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/compute/api_vector.cc:412
   #16 0x00007fffeb22c37b in arrow::acero::(anonymous 
namespace)::FilterNode::ProcessBatch (this=<optimized out>, batch=...)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/filter_node.cc:102
   #17 0x00007fffeb23e565 in arrow::acero::MapNode::InputReceived 
(this=this@entry=0x5555602ecab0, input=input@entry=0x555557f57050, batch=...)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/map_node.cc:76
   #18 0x00007fffeb2d9744 in 
arrow::acero::aggregate::GroupByNode::OutputNthBatch (this=0x555557f57050, 
n=1120) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/groupby_aggregate_node.cc:341
   #19 0x00007fffeb2d9883 in operator() (task_id=<optimized out>, 
__closure=<optimized out>) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/groupby_aggregate_node.cc:64
   #20 std::__invoke_impl<arrow::Status, 
arrow::acero::aggregate::GroupByNode::Init()::<lambda(size_t, int64_t)>&, long 
unsigned int, long int> (__f=...) at /usr/include/c++/12/bits/invoke.h:61
   #21 std::__invoke_r<arrow::Status, 
arrow::acero::aggregate::GroupByNode::Init()::<lambda(size_t, int64_t)>&, long 
unsigned int, long int> (__fn=...) at /usr/include/c++/12/bits/invoke.h:116
   #22 std::_Function_handler<arrow::Status(long unsigned int, long int), 
arrow::acero::aggregate::GroupByNode::Init()::<lambda(size_t, int64_t)> 
>::_M_invoke(const std::_Any_data &, unsigned long &&, long &&) (
       __functor=..., __args#0=<optimized out>, __args#1=<optimized out>) at 
/usr/include/c++/12/bits/std_function.h:291
   #23 0x00007fffeb289a7e in std::function<arrow::Status (unsigned long, 
long)>::operator()(unsigned long, long) const (__args#1=<optimized out>, 
__args#0=<optimized out>, this=<optimized out>)
       at /usr/include/c++/12/bits/std_function.h:591
   #24 arrow::acero::TaskSchedulerImpl::ExecuteTask (this=0x5555620679e0, 
thread_id=<optimized out>, group_id=<optimized out>, task_id=<optimized out>, 
task_group_finished=0x7fffa17f9c16)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/task_util.cc:212
   #25 0x00007fffeb28a2c1 in operator() (__closure=<optimized out>) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/task_util.cc:366
   #26 operator() (thread_id=14, __closure=0x7fff5402edc0) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/task_util.cc:366
   #27 std::__invoke_impl<arrow::Status, 
arrow::acero::TaskSchedulerImpl::ScheduleMore(size_t, int)::<lambda(size_t)>&, 
long unsigned int> (__f=...) at /usr/include/c++/12/bits/invoke.h:61
   #28 std::__invoke_r<arrow::Status, 
arrow::acero::TaskSchedulerImpl::ScheduleMore(size_t, int)::<lambda(size_t)>&, 
long unsigned int> (__fn=...) at /usr/include/c++/12/bits/invoke.h:116
   #29 std::_Function_handler<arrow::Status(long unsigned int), 
arrow::acero::TaskSchedulerImpl::ScheduleMore(size_t, int)::<lambda(size_t)> 
>::_M_invoke(const std::_Any_data &, unsigned long &&) (__functor=...,
       __args#0=<optimized out>) at /usr/include/c++/12/bits/std_function.h:291
   #30 0x00007fffeb24f034 in std::function<arrow::Status (unsigned 
long)>::operator()(unsigned long) const (__args#0=<optimized out>, 
this=0x7ffecc01b388) at /usr/include/c++/12/bits/std_function.h:591
   #31 operator() (__closure=0x7ffecc01b380) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/acero/query_context.cc:72
   #32 std::__invoke_impl<arrow::Status, 
arrow::acero::QueryContext::ScheduleTask(std::function<arrow::Status(long 
unsigned int)>, std::string_view)::<lambda()>&> (__f=...) at 
/usr/include/c++/12/bits/invoke.h:61
   #33 std::__invoke_r<arrow::Status, 
arrow::acero::QueryContext::ScheduleTask(std::function<arrow::Status(long 
unsigned int)>, std::string_view)::<lambda()>&> (__fn=...) at 
/usr/include/c++/12/bits/invoke.h:116
   #34 std::_Function_handler<arrow::Status(), 
arrow::acero::QueryContext::ScheduleTask(std::function<arrow::Status(long 
unsigned int)>, std::string_view)::<lambda()> >::_M_invoke(const std::_Any_data 
&) (
       __functor=...) at /usr/include/c++/12/bits/std_function.h:291
   #35 0x00007fffeb250e5f in std::function<arrow::Status ()>::operator()() 
const (this=<optimized out>) at /usr/include/c++/12/bits/std_function.h:591
   #36 arrow::detail::ContinueFuture::operator()<std::function<arrow::Status 
()>&, , arrow::Status, arrow::Future<arrow::internal::Empty> 
>(arrow::Future<arrow::internal::Empty>, std::function<arrow::Status ()>&) 
const (this=<optimized out>, f=..., next=...) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/util/future.h:150
   #37 std::__invoke_impl<void, arrow::detail::ContinueFuture&, 
arrow::Future<arrow::internal::Empty>&, std::function<arrow::Status 
()>&>(std::__invoke_other, arrow::detail::ContinueFuture&, 
arrow::Future<arrow::internal::Empty>&, std::function<arrow::Status ()>&) 
(__f=...) at /usr/include/c++/12/bits/invoke.h:61
   #38 std::__invoke<arrow::detail::ContinueFuture&, 
arrow::Future<arrow::internal::Empty>&, std::function<arrow::Status 
()>&>(arrow::detail::ContinueFuture&, arrow::Future<arrow::internal::Empty>&, 
std::function<arrow::Status ()>&) (__fn=...) at 
/usr/include/c++/12/bits/invoke.h:96
   #39 std::_Bind<arrow::detail::ContinueFuture 
(arrow::Future<arrow::internal::Empty>, std::function<arrow::Status 
()>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) 
(__args=...,
       this=<optimized out>) at /usr/include/c++/12/functional:484
   #40 std::_Bind<arrow::detail::ContinueFuture 
(arrow::Future<arrow::internal::Empty>, std::function<arrow::Status 
()>)>::operator()<, void>() (this=<optimized out>) at 
/usr/include/c++/12/functional:567
   #41 arrow::internal::FnOnce<void 
()>::FnImpl<std::_Bind<arrow::detail::ContinueFuture 
(arrow::Future<arrow::internal::Empty>, std::function<arrow::Status ()>)> 
>::invoke() (this=<optimized out>)
       at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/util/functional.h:152
   #42 0x00007fffec3e969f in arrow::internal::FnOnce<void ()>::operator()() && 
(this=0x7fffa17f9d30) at /usr/include/c++/12/bits/unique_ptr.h:191
   #43 arrow::internal::WorkerLoop (it=..., state=...) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/util/thread_pool.cc:478
   #44 operator() (__closure=<optimized out>) at 
/tmp/RtmpvG0xfW/R.INSTALL2875d41805ac1/arrow/tools/cpp/src/arrow/util/thread_pool.cc:643
   #45 std::__invoke_impl<void, 
arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > (__f=...) 
at /usr/include/c++/12/bits/invoke.h:61
   #46 
std::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()>
 > (__fn=...) at /usr/include/c++/12/bits/invoke.h:96
   #47 
std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()>
 > >::_M_invoke<0> (this=<optimized out>) at 
/usr/include/c++/12/bits/std_thread.h:252
   #48 
std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()>
 > >::operator() (this=<optimized out>) at 
/usr/include/c++/12/bits/std_thread.h:259
   #49 
std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()>
 > > >::_M_run(void) (this=<optimized out>)
       at /usr/include/c++/12/bits/std_thread.h:210
   #50 0x00007ffff4cd44a3 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
   #51 0x00007ffff78a81f5 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
   #52 0x00007ffff792889c in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
   ```
   
   I guess if I had a debug build that'd help.. or symbols for some more things.
   Not a debugging expert. Let me know what else I can do!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to