Hi All,
We're observing a SIGSEGV in gRPC 1.20. We're implementing GNMI proto as a
reference API.
This SIGSEGV emanates from a thread that's simply blocking/waiting on
CompletionQueue::Next(). In other threads, we are actively invoking
stream-stream/unary-unary RPCs (Subscribe/Set)
The SIGSEGV traces to grpc core - ev_epollex_linux.cc: kick_one_worker
function is accessing 'specific_worker->pollable_obj' which is NULL, thus
the offset to p->mu fails.
Was wondering if there's any insight as to what causes this pollable_obj to
become null? Is this resource shared? Each thread we have manages it's
own private completion queue.
Any ideas where we should be looking?
Thanks,
Bryce
static grpc_error* kick_one_worker(grpc_pollset_worker* specific_worker) {
GPR_TIMER_SCOPE("kick_one_worker", 0);
pollable* p = specific_worker->pollable_obj; <----NULL!
grpc_core::MutexLock lock(&p->mu);
GPR_ASSERT(specific_worker != nullptr);
if (specific_worker->kicked) {
if (grpc_polling_trace.enabled()) {
gpr_log(GPR_INFO, "PS:%p kicked_specific_but_already_kicked", p);
}
GRPC_STATS_INC_POLLSET_KICKED_AGAIN();
return GRPC_ERROR_NONE;
}
(gdb) bt
#0 0x0000007fb95a8d34 in pthread_mutex_lock () from
/lib/aarch64-linux-gnu/libpthread.so.0
#1 0x0000007fb8d73a0c in gpr_mu_lock (mu=mu@entry=0x78) at
src/core/lib/gpr/sync_posix.cc:103
#2 0x0000007fb8dfb59c in grpc_core::MutexLock::MutexLock (mu=0x78,
this=<synthetic pointer>) at ./src/core/lib/gprpp/mutex_lock.h:30
#3 kick_one_worker (specific_worker=specific_worker@entry=0x7f697eeca8) at
src/core/lib/iomgr/ev_epollex_linux.cc:694
#4 0x0000007fb8dfbf40 in pollset_kick_all (pollset=<optimized out>) at
src/core/lib/iomgr/ev_epollex_linux.cc:793
#5 0x0000007fb8dfbf9c in pollset_shutdown (pollset=0x7f780267c8,
closure=<optimized out>) at src/core/lib/iomgr/ev_epollex_linux.cc:866
#6 0x0000007fb8d8f80c in cq_end_op_for_pluck (cq=0x7f780266d0,
tag=0x7f697eef20, error=0x0, done=0x7fb8d8bdf8
<finish_batch_completion(void*, grpc_cq_completion*)>,
done_arg=0x7f78137de0, storage=0x7f78137e30) at
src/core/lib/surface/completion_queue.cc:790
#7 0x0000007fb8d8c278 in receiving_trailing_metadata_ready
(bctlp=0x7f78137de0, error=<optimized out>) at
src/core/lib/surface/call.cc:1480
#8 0x0000007fb8d79b68 in exec_ctx_run (closure=<optimized out>, error=0x0)
at src/core/lib/iomgr/exec_ctx.cc:40
#9 0x0000007fb8d79b68 in exec_ctx_run (closure=<optimized out>, error=0x0)
at src/core/lib/iomgr/exec_ctx.cc:40
#10 0x0000007fb8da6768 in grpc_closure_run (error=0x0, c=0x7f78137970) at
./src/core/lib/iomgr/closure.h:259
#11 grpc_core::SubchannelCall::RecvTrailingMetadataReady (arg=<optimized
out>, error=<optimized out>) at
src/core/ext/filters/client_channel/subchannel.cc:289
#12 0x0000007fb8d79b68 in exec_ctx_run (closure=<optimized out>, error=0x0)
at src/core/lib/iomgr/exec_ctx.cc:40
#13 0x0000007fb8d79b68 in exec_ctx_run (closure=<optimized out>, error=0x0)
at src/core/lib/iomgr/exec_ctx.cc:40
#14 0x0000007fb8d79ddc in exec_ctx_run (error=0x0, closure=<optimized out>)
at src/core/lib/iomgr/exec_ctx.cc:148
#15 grpc_core::ExecCtx::Flush (this=0x7f837fd680) at
src/core/lib/iomgr/exec_ctx.cc:148
#16 0x0000007fb8dfc880 in pollset_work (pollset=0x55965fa108,
worker_hdl=<optimized out>, deadline=<optimized out>) at
./src/core/lib/iomgr/exec_ctx.h:213
#17 0x0000007fb8d9058c in cq_next (cq=0x55965fa040, deadline=...,
reserved=<optimized out>) at src/core/lib/surface/completion_queue.cc:1021
#18 0x0000007fb8d732b4 in grpc::CompletionQueue::AsyncNextInternal
(this=0x55965f2fe8, tag=0x7f837fd778, ok=0x7f837fd777, deadline=...) at
src/cpp/common/completion_queue_cc.cc:48
#19 0x0000007fb8d593c4 in grpc::CompletionQueue::Next(void**, bool*) () at
src/cpp/common/completion_queue_cc.cc:91
#20 0x0000007fb8d509d8 in
Client::CrudService::AsyncNotifyChannelStateMonitor() () at
src/cpp/common/completion_queue_cc.cc:91
#21 0x0000007fb8d5dce8 in void std::__invoke_impl<void, void
(Client::CrudService::*)(),
Client::CrudService*>(std::__invoke_memfun_deref, void
(Client::CrudService::*&&)(), Client::CrudService*&&) () at
src/cpp/common/completion_queue_cc.cc:91
#22 0x0000007fb8d5bf3c in std::__invoke_result<void
(Client::CrudService::*)(), Client::CrudService*>::type std::__invoke<void
(Client::CrudService::*)(), Client::CrudService*>(void
(Client::CrudService::*&&)(), Client::CrudService*&&) ()
at src/cpp/common/completion_queue_cc.cc:91
#23 0x0000007fb8d63434 in decltype (__invoke((_S_declval<0ul>)(),
(_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void
(Client::CrudService::*)(), Client::CrudService*> >::_M_invoke<0ul,
1ul>(std::_Index_tuple<0ul, 1ul>) () at
src/cpp/common/completion_queue_cc.cc:91
#24 0x0000007fb8d6332c in std::thread::_Invoker<std::tuple<void
(Client::CrudService::*)(), Client::CrudService*> >::operator()() () at
src/cpp/common/completion_queue_cc.cc:91
#25 0x0000007fb8d63220 in
std::thread::_State_impl<std::thread::_Invoker<std::tuple<void
(Client::CrudService::*)(), Client::CrudService*> > >::_M_run() () at
src/cpp/common/completion_queue_cc.cc:91
#26 0x0000007fb7a471f4 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/grpc-io/829bc84f-2eab-4c32-afea-73b2418374e7n%40googlegroups.com.