ustccy opened a new issue #1177: URL: https://github.com/apache/incubator-brpc/issues/1177
**Describe the bug (描述bug)** 一次网络抖动导致epoll_wait协程的about_to_quit被置为true,导致epoll_wait协程切出时不唤醒其他worker。进而导致多数worker进入永久的休眠状态。 正常的切换流程: 1.epoll_wait有新事件被唤醒。所在线程A开始处理EPOLL事件。 2.线程A处理EPOLLIN事件,调用StartInputEvent->bthread_start_urgent->start_foreground->set_remained保留epoll_wait当前调用栈,然后在sched_to中jump_stack到task_runner函数。 3.线程A在task_runner中首先调用remain函数(ready_to_run_in_worker),将epoll_wait协程入rq并唤醒一个线程。 4.线程A进入ProcessEvent->OnNewMessages开始处理第一个EPOLLIN事件,线程B被唤醒后steal到epoll_wait协程继续执行后续的EPOLLIN事件。如此循环。 当epoll_wait的about_to_quit被置true时,上述步骤3会调用ready_to_run_in_worker_ignoresignal函数处理remain,该函数只是将epoll_wait协程入队并不唤醒。没有唤醒机制下,只有about_to_quit置true时还在work的协程会在调完当前协程后继续steal epoll_wait工作,当时在wait_task的协程将无法再被唤醒。后续EPOLL事件只能被这些少量的worker处理,整体处理时延上升。 通过内置服务获取epoll_wait协程的状态,about_to_quit被置true了  通过分析发现 epoll_wait在处理EPOLLOUT|EPOLLERR|EPOLLHUP时没有起协程,所以epoll_wait在处理EPOLLERR时会展开,当controller.IssueRPC发生网络问题时,epoll_wait触发EPOLLERR事件,会依次调用HandleEpollOut->HandleEpollOutRequest->KeepWriteIfConnected->CheckConnectedAndKeepWrite->AfterAppConnected->ReturnFailedWriteRequest->bthread_id_error2->controller::HandleSocketFailed->OnVersionedRPCReturned->IssueRPC 如果epoll_wait协程在最后一次重试调用IssueRPC里,出现HandleSendFailed(),并且本次RPC是同步调用(_done==NULL),就会导致进入OnVersionedRPCReturned(new_bthread=false)->EndRPC()->bthread_about_to_quit()。将epoll_wait协程的about_to_quit错误的置为true,进而导致后续的唤醒机制失效 **To Reproduce (复现方法)** 先触发网络错误进入不断重试,最后一次重试时在Controller::IssueRPC::const in rc = _lb->SelectServer()处返回非0。 **Expected behavior (期望行为)** epoll_wait的about_to_quit always be false **Versions (各种版本)** 应该和版本无关 OS:ubuntu16.04 Compiler:gcc 5.4.0 brpc:all protobuf:~ **Additional context/screenshots (更多上下文/截图)** ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
