Err... Actually I just realized that my source files had changed so that stack isn't entirely accurate. Here is a proper stack trace:
#0 0x00007f6ddb4a1a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f6dcbf4471c in gpr_cv_wait (cv=0x7f6db13f8a90, > mu=0x7f6a65b9cd90, abs_deadline=...) at src/core/lib/support/sync_posix.c:80 > #2 0x00007f6dcbe9bea8 in begin_worker (pollset=0x7f6a65b9cd90, > worker=0x7f6db13f8a70, worker_hdl=0x0, now=0x7f6db13f8a30, deadline=...) at > src/core/lib/iomgr/ev_epoll1_linux.c:747 > #3 0x00007f6dcbe9c7c9 in pollset_work (exec_ctx=0x7f6db13f8b80, > ps=0x7f6a65b9cd90, worker_hdl=0x0, now=..., deadline=...) at > src/core/lib/iomgr/ev_epoll1_linux.c:941 > #4 0x00007f6dcbea9691 in grpc_pollset_work (exec_ctx=0x7f6db13f8b80, > pollset=0x7f6a65b9cd90, worker=0x0, now=..., deadline=...) at > src/core/lib/iomgr/ev_posix.c:210 > #5 0x00007f6dcbecc1a8 in cq_next (cq=0x7f6a65b9cc80, deadline=..., > reserved=0x0) at src/core/lib/surface/completion_queue.c:914 > #6 0x00007f6dcbecc5d2 in grpc_completion_queue_next (cq=0x7f6a65b9cc80, > deadline=..., reserved=0x0) at src/core/lib/surface/completion_queue.c:992 > #7 0x00007f6dcc394ecc in HPHP::c_Call_ni_startBatch > (this_=0x7f6a9288a670, actions=...) at > /opt/build/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:583 and #0 0x00007f6dd56b6d43 in epoll_wait () from /lib64/libc.so.6 > #1 0x00007f6dcbe9b9e5 in do_epoll_wait (exec_ctx=0x7f6d99ff8b80, > ps=0x7f6a65867710, now=..., deadline=...) at > src/core/lib/iomgr/ev_epoll1_linux.c:640 > #2 0x00007f6dcbe9c972 in pollset_work (exec_ctx=0x7f6d99ff8b80, > ps=0x7f6a65867710, worker_hdl=0x0, now=..., deadline=...) at > src/core/lib/iomgr/ev_epoll1_linux.c:964 > #3 0x00007f6dcbea9691 in grpc_pollset_work (exec_ctx=0x7f6d99ff8b80, > pollset=0x7f6a65867710, worker=0x0, now=..., deadline=...) at > src/core/lib/iomgr/ev_posix.c:210 > #4 0x00007f6dcbecc1a8 in cq_next (cq=0x7f6a65867600, deadline=..., > reserved=0x0) at src/core/lib/surface/completion_queue.c:914 > #5 0x00007f6dcbecc5d2 in grpc_completion_queue_next (cq=0x7f6a65867600, > deadline=..., reserved=0x0) at src/core/lib/surface/completion_queue.c:992 > #6 0x00007f6dcc394ecc in HPHP::c_Call_ni_startBatch > (this_=0x7f6a624e0670, actions=...) at > /opt/build/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:583 These are both proper examples of it just timing out on epoll_wait/pthread_cond_timedwait. The other thing I've noticed is on the bad request: > > D0919 14:54:36.312349629 51030 combiner.c:163] C:0x7f54faf35c00 > grpc_combiner_execute c=0x7f4c22edbfb0 *last=61995* last is 61995 which seems to indicate the number of scheduled items in the queue. In a normal request this number is kept low. On Tuesday, September 19, 2017 at 9:19:11 AM UTC-7, Michael Lumish wrote: > > It looks like your stack trace includes a call to > grpc_server_request_registered_call > with a NULL server. That is suspicious. > > On Tue, Sep 19, 2017 at 8:41 AM ryan via grpc.io <[email protected] > <javascript:>> wrote: > >> >> Hello all - >> >> I've been developing a gRPC-HHVM extension >> <https://github.com/grpc/grpc/pull/11553>. In the process of stress >> testing this extension we've consistently seen a condition where gRPC gets >> into a state where any request that is made gets indefinitely stuck on the >> epoll_wait syscall (or times out if a timeout is set on >> grpc_completion_queue_next). Everything works fine and is chugging along >> and then this problem happens and all of the requests get stuck/timeout. >> This is what it looks like in gdb: >> >> Thread 21 (Thread 0x7f54f6bff700 (LWP 51033)): >> >> #0 0x00007f554ad90d43 in epoll_wait () from /lib64/libc.so.6 *<-- >> indefinitely stuck or times out if one is set* >> >> #1 0x00007f55416969e5 in add_poll_object (exec_ctx=0x2482ad88, >> bag=0x188924, bag_type=(POLL_OBJ_POLLSET | unknown: 32596), item=0x0, >> item_type=(POLL_OBJ_POLLSET | unknown: 1097427428)) at >> src/core/lib/iomgr/ev_epollsig_linux.c:1484 >> >> #2 0x00007f5541697972 in fd_ref (fd=0x4, reason=0x33b8b87c0 <Address >> 0x33b8b87c0 out of bounds>, file=0x0, line=0) at >> src/core/lib/iomgr/ev_poll_posix.c:447 >> >> #3 0x00007f55416a4691 in tcp_drop_uncovered_then_handle_write >> (exec_ctx=0x7f54f6bf8680, arg=0x7f4c4e2c9010, error=0x0) at >> src/core/lib/iomgr/tcp_posix.c:239 >> >> #4 0x00007f55416c71a8 in grpc_server_request_registered_call >> (server=0x0, rmp=0x0, call=0x2482ad88, deadline=0x188924, >> initial_metadata=0x7f4c4e2c8f00, optional_payload=0x0, >> cq_bound_to_call=0x4, cq_for_notification=0x33b8b87c0, >> >> tag=0x0) at src/core/lib/surface/server.c:1486 >> >> >> #5 0x00007f55416c75d2 in conforms_to (slice=..., >> legal_bits=0x7f4c22ce2e50 "\270\064\245\214L\177", err_desc=0x7f54f6bf8701 >> "\211\030") at src/core/lib/surface/validate_metadata.c:40 >> >> #6 0x00007f5541b97105 in HPHP::c_Call_ni_startBatch >> (this_=0x7f4c22ce2e60, actions=...) at >> /home/ryangordon/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:529 >> >> >> The stack trace doesn't always look like that but it always is stuck on >> epoll_wait/epoll_pwait often from gpr_cv_wait. >> >> >> I've managed to get GPRC_TRACE=all GRPC_VERBOSITY=DEBUG of a good >> request and a request when it is in the bad state to compare side by side. >> >> >> Good Request: >> https://gist.githubusercontent.com/RyanGordon/831a8213e735bb1a4e5b93a854aab3c8/raw/7e37d595dbe56a77274bff83b64ab661a80e0c27/Not-Frozen%2520Log%2520Snippet >> >> Bad Request: >> https://gist.githubusercontent.com/RyanGordon/831a8213e735bb1a4e5b93a854aab3c8/raw/7e37d595dbe56a77274bff83b64ab661a80e0c27/Frozen%2520log%2520snippet >> >> >> The bad request looks like this when it's execution deviates: >> >> >> D0919 14:54:36.312349629 51030 combiner.c:163] >> C:0x7f54faf35c00 grpc_combiner_execute c=0x7f4c22edbfb0 last=61995 >> D0919 14:54:36.312353011 51030 exec_ctx.c:73] closure >> 0x7f4c22edbec0 finished >> I0919 14:54:36.312358626 51030 completion_queue.c:834] >> grpc_completion_queue_next(cq=0x7f4c22e9e200, deadline=gpr_timespec { >> tv_sec: 4, tv_nsec: 999000000, clock_type: 3 }, reserved=(nil)) >> >> ... >> >> >> Whereas, the good request does this: >> >> D0915 15:19:50.178995777 13689 combiner.c:163] >> C:0x7f932172f700 grpc_combiner_execute c=0x7f9313d67fb0 last=1 >> D0915 15:19:50.179005880 13689 exec_ctx.c:73] closure >> 0x7f9313d67ec0 finished >> D0915 15:19:50.179074826 13689 combiner.c:225] >> C:0x7f932172f700 grpc_combiner_continue_exec_ctx contended=0 >> exec_ctx_ready_to_finish=1 time_to_execute_final_list=0 >> D0915 15:19:50.179077765 13689 combiner.c:243] >> C:0x7f932172f700 maybe_finish_one n=0x7f9313d67fb0 >> >> ... (and a whole lot more) >> >> >> >> >> Effectively it seems like the something is not getting scheduled when it >> should be (grpc_combiner_continue_exec_ctx never gets run). I've been >> working on this for a few weeks now and haven't been able to track down why >> or a fix. I have the full logs if they might be helpful. I've been testing >> on my branch that is kept pretty close to grpc/grpc master. >> >> >> Does anyone have any idea what is going on? I have the bad process open on a >> test server in gdb and have timeouts implemented so I can run repeated >> requests at it to continue debugging if necessary. >> >> >> Thanks, >> >> Ryan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "grpc.io" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/grpc-io. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/grpc-io/b3cc406f-9b93-4259-b8f9-df3eb904132a%40googlegroups.com >> >> <https://groups.google.com/d/msgid/grpc-io/b3cc406f-9b93-4259-b8f9-df3eb904132a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/0f7f4794-adea-41e1-b644-9f9f6366532e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
