zyearn commented on issue #1648: URL: https://github.com/apache/incubator-brpc/issues/1648#issuecomment-1002745519
这个问题和 @lintanghui 线下讨论了下,场景是A(proxy)发请求到B (data node),这个栈是在A上的。 问题发生在A->B网络发生故障(交换机)的时候,B一直没有rpc response(请求都不一定到了B),而A一直在接收请求并向B发送请求,此时A内存一直在变大,导致A上没有足够bthread栈分配,后面bthread就跑在了pthread栈上,一路就调用到了bthread::butex_wait_from_pthread上,导致所有worker都在等待。此时timer也因rq full卡在sleep上,只能重启解决。 给A设置限流不能解决问题,A->C/D/E都是没问题的,问题B节点久而久之会把所有并发都占满导致到C/D/E的请求受影响。这种情况可以试试熔断,或者不要在B/C/D/E里随机选下游而是根据inflight请求数量选择。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
