crystalyouth opened a new issue #1085: PrometheusMetricsService造成死锁 URL: https://github.com/apache/incubator-brpc/issues/1085 **Describe the bug (描述bug)** 我们的分布式系统SDK封装给python调用,发现sdk相关bthread worker线程都被卡住。相关调用栈如下: ``` Thread 12 (Thread 0x7fa6b7fff700 (LWP 6104)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 11 (Thread 0x7fa6d4ff9700 (LWP 6103)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 10 (Thread 0x7fa6d57fa700 (LWP 6102)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 9 (Thread 0x7fa6d5ffb700 (LWP 6101)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin g<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message *, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 8 (Thread 0x7fa6d67fc700 (LWP 6100)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin g<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message *, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 7 (Thread 0x7fa6d6ffd700 (LWP 6099)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6bb5 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4b4e009 in bvar::PerSecond<bvar::PassiveStatus<unsigned long> >::get_value(long) const () from /usr/lib/libbthread.so #3 0x00007fa6e4b4ddf6 in bvar::detail::WindowBase<bvar::PassiveStatus<unsigned long>, (bvar::SeriesFrequency)1>::describe(std::ostream&, bool) const () from /usr/lib/libbthread.so #4 0x00007fa6e4ad953f in bvar::Variable::describe_exposed(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream&, bool, bvar::DisplayFilter) () from /usr/lib/libbvar.so #5 0x00007fa6e4adce2b in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #6 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #7 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message *, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #8 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #9 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #11 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #12 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #13 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #14 0x0000000000000000 in ?? () Thread 6 (Thread 0x7fa6d77fe700 (LWP 6098)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin g<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 5 (Thread 0x7fa6d7fff700 (LWP 6097)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin g<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message *, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000000 in ?? () Thread 4 (Thread 0x7fa6dc9ce700 (LWP 6095)): #0 0x00007fa6ee3fcf7c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007fa6ee3f6c26 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fa6e4adae81 in bvar::Variable::list_exposed(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin g<char, std::char_traits<char>, std::allocator<char> > > >*, bvar::DisplayFilter) () from /usr/lib/libbvar.so #3 0x00007fa6e4adbf56 in bvar::Variable::dump_exposed(bvar::Dumper*, bvar::DumpOptions const*) () from /usr/lib/libbvar.so #4 0x00007fa6e5358c77 in brpc::PrometheusMetricsService::default_method(google::protobuf::RpcController*, brpc::MetricsRequest const*, brpc::MetricsResponse*, google::protobuf::Closure*) () from /usr/lib/libbrpc.so #5 0x00007fa6e463da64 in brpc::metrics::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message *, google::protobuf::Closure*) () from /usr/lib/libcc_brpc_internal_proto.so #6 0x00007fa6e5406acb in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from /usr/lib/libbrpc.so #7 0x00007fa6e53bddf7 in brpc::ProcessInputMessage(void*) () from /usr/lib/libbrpc.so #8 0x00007fa6e53bef21 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from /usr/lib/libbrpc.so #9 0x00007fa6e5488b20 in brpc::Socket::ProcessEvent(void*) () from /usr/lib/libbrpc.so #10 0x00007fa6e4b478fd in bthread::TaskGroup::task_runner(long) () from /usr/lib/libbthread.so #11 0x00007fa6e4b21241 in bthread_make_fcontext () from /usr/lib/libbthread.so #12 0x0000000000000024 in ?? () #13 0x0000000000000000 in ?? () Thread 3 (Thread 0x7fa6dd1cf700 (LWP 6094)): #0 0x00007fa6eda27469 in syscall () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fa6e4b4cfa0 in bthread::TimerThread::run() () from /usr/lib/libbthread.so #2 0x00007fa6e4b4ddb9 in bthread::TimerThread::run_this(void*) () from /usr/lib/libbthread.so #3 0x00007fa6ee3f44a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007fa6eda2bd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ``` 线程4-12调用栈都是由`PrometheusMetricsService::default_method`进入。 线程7中调用了一个`Persecond<PassiveStatus<unsigned long>>::get_value`,等待在一处锁上。 其余4-6/8-12线程等待的锁,应该是线程7 describe的时候获取了。 我们的代码中没有定义`Persecond<PassiveStatus<unsigned long>>`类型的变量。 brpc代码中,default_variables.cpp中部分bvar是这个类型,比如ProcStat的。这些变量有一个`CachedReader`做优化,这里有一处加锁行为。但是从代码中看,这里应该不存在死锁可能。 **To Reproduce (复现方法)** 暂时不确定复现场景 **Versions (各种版本)** OS: debian 9.9 Compiler: gcc version 6.3.0 20170516 brpc: 1b9e006 protobuf:
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
