cbsheng opened a new issue, #2925:
URL: https://github.com/apache/brpc/issues/2925

   **Describe the bug (描述bug)**
   在内存暴涨场景下(100GB+),抓取 bthread 堆栈,观察到 sched_to 函数的参数值被意外修改。典型的堆栈如下:
   
   > Bthread 50683:
   > #0  bthread::TaskGroup::sched_to (pg=0x7ffc42cb5000, next_meta=0x0) at 
/home/project/third_party/brpc/src/bthread/task_group.cpp:662
   > #1  0x00005611f605ba89 in bthread::TaskGroup::sched_to 
(next_tid=<optimized out>, pg=0x7fb4b94cb888)
   >     at /home/project/third_party/brpc/src/bthread/task_group_inl.h:79
   > #2  bthread::TaskGroup::sched (pg=pg@entry=0x7fb4b94cb888) at 
/home/project/third_party/brpc/src/bthread/task_group.cpp:600
   > #3  0x00005611f6226f7e in bthread::butex_wait 
(arg=arg@entry=0x7fc7b2d11d80, expected_value=expected_value@entry=257, 
abstime=abstime@entry=0x0, prepend=prepend@entry=false)
   >     at /home/project/third_party/brpc/src/bthread/butex.cpp:705
   > #4  0x00005611f60518c5 in bthread::mutex_lock_contended_impl (abstime=0x0, 
m=<optimized out>) at /home/project/third_party/brpc/src/bthread/mutex.cpp:1068
   > #5  bthread_mutex_lock_impl (m=<optimized out>, abstime=0x0) at 
/home/project/third_party/brpc/src/bthread/mutex.cpp:1220
   > #6  0x00005611f5dadde6 in bthread::Mutex::lock (this=<optimized out>) at 
/home/project/deps/brpc/include/bthread/mutex.h:59
   > #7  0x00005611f5dabc54 in comm::LockGuard::LockGuard (mtx=..., 
this=<synthetic pointer>) at /home/project/src/comm/mutex.h:14
   > #8  extnode::ECExtent::<lambda(comm::ErrorCode, uint32_t, 
s2s::dn::ReadAtExtentResponse*, bool)>::operator()(comm::ErrorCode, uint32_t, 
s2s::dn::ReadAtExtentResponse *, bool) const (__closure=0x7fcb5
   > 8d73cd0, error_code=comm::REPLICA_PEER_CALL_ERR, shard_id=6, 
read_at_rsp=0x7fc89056ea00, reissue=<optimized out>)
   >     at /home/project/src/extentnode/redundant/ec_extent.cc:568
   > #9  0x00005611f5da0c9a in std::function<void (comm::ErrorCode, unsigned 
int, s2s::dn::ReadAtExtentResponse*, bool)>::operator()(comm::ErrorCode, 
unsigned int, s2s::dn::ReadAtExtentResponse*, bool) co
   > nst (__args#3=<optimized out>, __args#2=<optimized out>, 
__args#1=<optimized out>, __args#0=<optimized out>, this=0x7fcb5847a498)
   >     at /usr/include/c++/9/bits/std_function.h:683
   > #10 
extnode::ECExtent::<lambda(comm::ErrorCode)>::operator()(comm::ErrorCode) const 
(__closure=0x7fcb5847a450, error_code=comm::REPLICA_PEER_CALL_ERR)
   >     at /home/project/src/extentnode/redundant/ec_extent.cc:875
   > #11 0x00005611f5d3f3e8 in std::function<void 
(comm::ErrorCode)>::operator()(comm::ErrorCode) const (__args#0=<optimized 
out>, this=<optimized out>)
   >     at /usr/include/c++/9/bits/std_function.h:683
   > #12 extnode::BrpcShardCaller::<lambda()>::operator() (__closure=<optimized 
out>, __closure=<optimized out>)
   >     at /home/project/src/extentnode/rpc_caller/brpc_extnode_caller.cc:162
   > #13 std::_Function_handler<void(), 
extnode::BrpcShardCaller::readAtRpcCb(comm::Ctx*, brpc::Controller*, 
s2s::dn::ReadAtExtentResponse*, comm::Callback)::<lambda()> >::_M_invoke(const 
std::_Any_data &
   > ) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300
   > #14 0x00005611f5d40ddc in std::function<void ()>::operator()() const 
(this=0x7fb4b94cbda0) at /usr/include/c++/9/bits/std_function.h:683
   > #15 comm::DeferHelper::~DeferHelper (this=0x7fb4b94cbda0, 
__in_chrg=<optimized out>) at /home/project/src/comm/defer.h:12
   > #16 extnode::BrpcShardCaller::readAtRpcCb(comm::Ctx*, brpc::Controller*, 
s2s::dn::ReadAtExtentResponse*, std::function<void (comm::ErrorCode)>) 
(this=<optimized out>, 
   >     ctx=<optimized out>, cntl=0x7f874bebdd00, rpc_rsp=<optimized out>, 
cb=...) at /home/project/src/extentnode/rpc_caller/brpc_extnode_caller.cc:181
   > #17 0x00005611f5d445f1 in 
brpc::internal::MethodClosure4<extnode::BrpcShardCaller, 
extnode::BrpcShardCaller*, comm::Ctx*, brpc::Controller*, 
s2s::dn::ReadAtExtentResponse*, std::function<void (comm::ErrorCode)> >::Run() 
(this=0x7fc55ff2eb70) at /usr/include/c++/9/bits/std_function.h:564
   > #18 0x00005611f6072c1b in brpc::Controller::EndRPC (this=0x7f874bebdd00, 
info=...) at /home/project/third_party/brpc/src/brpc/controller.cpp:968
   > #19 0x00005611f6072ef4 in brpc::Controller::RunEndRPC (arg=<optimized 
out>) at /home/project/third_party/brpc/src/brpc/controller.cpp:757
   > #20 0x00005611f605bdc7 in bthread::TaskGroup::task_runner 
(skip_remained=<optimized out>) at 
/home/project/third_party/brpc/src/bthread/task_group.cpp:305
   > #21 0x00005611f62288b1 in bthread_make_fcontext ()
   > #22 0x0000000000000000 in ?? ()
   > 
   > Bthread 50684:
   > #0  bthread::TaskGroup::sched_to (pg=0x7ffc42cb5000, next_meta=0x0) at 
/home/project/third_party/brpc/src/bthread/task_group.cpp:662
   > #1  0x00005611f605ba89 in bthread::TaskGroup::sched_to 
(next_tid=<optimized out>, pg=0x7fc118bcb7c8)
   >     at /home/project/third_party/brpc/src/bthread/task_group_inl.h:79
   > #2  bthread::TaskGroup::sched (pg=pg@entry=0x7fc118bcb7c8) at 
/home/project/third_party/brpc/src/bthread/task_group.cpp:600
   > #3  0x00005611f6226f7e in bthread::butex_wait 
(arg=arg@entry=0x7fc6552c6d80, expected_value=expected_value@entry=257, 
abstime=abstime@entry=0x0, prepend=prepend@entry=false)
   >     at /home/project/third_party/brpc/src/bthread/butex.cpp:705
   > #4  0x00005611f60518c5 in bthread::mutex_lock_contended_impl (abstime=0x0, 
m=<optimized out>) at /home/project/third_party/brpc/src/bthread/mutex.cpp:1068
   > #5  bthread_mutex_lock_impl (m=<optimized out>, abstime=0x0) at 
/home/project/third_party/brpc/src/bthread/mutex.cpp:1220
   > #6  0x00005611f5dadde6 in bthread::Mutex::lock (this=<optimized out>) at 
/home/project/deps/brpc/include/bthread/mutex.h:59
   > #7  0x00005611f5da0de5 in comm::LockGuard::LockGuard (mtx=..., 
this=<synthetic pointer>) at /home/project/src/comm/mutex.h:14
   > #8  
extnode::ECExtent::<lambda(comm::ErrorCode)>::operator()(comm::ErrorCode) const 
(__closure=0x7fc96cb98ad0, error_code=<optimized out>)
   >     at /home/project/src/extentnode/redundant/ec_extent.cc:890
   > #9  0x00005611f5d3f3e8 in std::function<void 
(comm::ErrorCode)>::operator()(comm::ErrorCode) const (__args#0=<optimized 
out>, this=<optimized out>)
   >     at /usr/include/c++/9/bits/std_function.h:683
   > #10 extnode::BrpcShardCaller::<lambda()>::operator() (__closure=<optimized 
out>, __closure=<optimized out>)
   >     at /home/project/src/extentnode/rpc_caller/brpc_extnode_caller.cc:162
   > #11 std::_Function_handler<void(), 
extnode::BrpcShardCaller::readAtRpcCb(comm::Ctx*, brpc::Controller*, 
s2s::dn::ReadAtExtentResponse*, comm::Callback)::<lambda()> >::_M_invoke(const 
std::_Any_data &
   > ) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300
   > #12 0x00005611f5d40ddc in std::function<void ()>::operator()() const 
(this=0x7fc118bcbab0) at /usr/include/c++/9/bits/std_function.h:683
   > #13 comm::DeferHelper::~DeferHelper (this=0x7fc118bcbab0, 
__in_chrg=<optimized out>) at /home/project/src/comm/defer.h:12
   > #14 extnode::BrpcShardCaller::readAtRpcCb(comm::Ctx*, brpc::Controller*, 
s2s::dn::ReadAtExtentResponse*, std::function<void (comm::ErrorCode)>) 
(this=<optimized out>, 
   >     ctx=<optimized out>, cntl=0x7fc9df812620, rpc_rsp=<optimized out>, 
cb=...) at /home/project/src/extentnode/rpc_caller/brpc_extnode_caller.cc:181
   > #15 0x00005611f5d445f1 in 
brpc::internal::MethodClosure4<extnode::BrpcShardCaller, 
extnode::BrpcShardCaller*, comm::Ctx*, brpc::Controller*, 
s2s::dn::ReadAtExtentResponse*, std::function<void (comm::
   > ErrorCode)> >::Run() (this=0x7fc9df812970) at 
/usr/include/c++/9/bits/std_function.h:564
   > #16 0x00005611f6072c1b in brpc::Controller::EndRPC (this=0x7fc9df812620, 
info=...) at /home/project/third_party/brpc/src/brpc/controller.cpp:968
   > #17 0x00005611f607436e in brpc::Controller::OnVersionedRPCReturned 
(this=this@entry=0x7fc9df812620, info=..., new_bthread=new_bthread@entry=false, 
   >     saved_error=saved_error@entry=0) at 
/home/project/third_party/brpc/src/brpc/controller.cpp:751
   > #18 0x00005611f60a76ab in brpc::ControllerPrivateAccessor::OnResponse 
(this=<synthetic pointer>, saved_error=0, id=...)
   >     at 
/home/project/third_party/brpc/src/brpc/details/controller_private_accessor.h:48
   > #19 brpc::policy::ProcessRpcResponse (msg_base=0x7fcb90062040) at 
/home/project/third_party/brpc/src/brpc/policy/baidu_rpc_protocol.cpp:819
   > #20 0x00005611f609d2db in brpc::ProcessInputMessage (void_arg=<optimized 
out>) at /home/project/third_party/brpc/src/brpc/input_messenger.cpp:184
   > #21 0x00005611f609f94b in brpc::InputMessenger::OnNewMessages 
(m=0x7fcb2cb32440) at /usr/include/c++/9/bits/atomic_base.h:493
   > #22 0x00005611f6180f95 in brpc::Socket::ProcessEvent (arg=0x7fcb2cb32440) 
at /home/project/third_party/brpc/src/brpc/socket.cpp:1196
   > #23 0x00005611f605bdc7 in bthread::TaskGroup::task_runner 
(skip_remained=<optimized out>) at 
/home/project/third_party/brpc/src/bthread/task_group.cpp:305
   > #24 0x00005611f62288b1 in bthread_make_fcontext ()
   > #25 0x0000000000000000 in ?? ()
   
   
   有两个地方比较疑惑:
   1、从 next_tid 为参数的 sched_to 跳转到以 next_meta 为参数的 sched_to 函数时,pg 
值被改变了。且所有堆栈被改变的值都指 0x7ffc42cb5000。
   2、next_meta 的值都变成 0x0。
   
   
   **To Reproduce (复现方法)**
   在我的服务环境里,在触发大量读数据请求(比如 33K QPS),每秒从网卡读 30Gb/s 左右的数据,持续 
60~120s。读到的数据在本地处理,处理逻辑里有使用 mutex 和 conditionvariable,所以在短时间内服务的内存会瞬间暴涨到 100GB+。
   
   
   **Expected behavior (期望行为)**
   
   
   **Versions (各种版本)**
   OS: 5.15.0
   Compiler: c++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
   brpc: v1.12.1
   protobuf:
   
   **Additional context/screenshots (更多上下文/截图)**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org
For additional commands, e-mail: dev-h...@brpc.apache.org

Reply via email to