It looks like your stack trace includes a call to
grpc_server_request_registered_call
with a NULL server. That is suspicious.

On Tue, Sep 19, 2017 at 8:41 AM ryan via grpc.io <[email protected]>
wrote:

>
> Hello all -
>
> I've been developing a gRPC-HHVM extension
> <https://github.com/grpc/grpc/pull/11553>. In the process of stress
> testing this extension we've consistently seen a condition where gRPC gets
> into a state where any request that is made gets indefinitely stuck on the
> epoll_wait syscall (or times out if a timeout is set on
> grpc_completion_queue_next). Everything works fine and is chugging along
> and then this problem happens and all of the requests get stuck/timeout.
> This is what it looks like in gdb:
>
> Thread 21 (Thread 0x7f54f6bff700 (LWP 51033)):
>
> #0  0x00007f554ad90d43 in epoll_wait () from /lib64/libc.so.6 *<--
> indefinitely stuck or times out if one is set*
>
> #1  0x00007f55416969e5 in add_poll_object (exec_ctx=0x2482ad88,
> bag=0x188924, bag_type=(POLL_OBJ_POLLSET | unknown: 32596), item=0x0,
> item_type=(POLL_OBJ_POLLSET | unknown: 1097427428)) at
> src/core/lib/iomgr/ev_epollsig_linux.c:1484
>
> #2  0x00007f5541697972 in fd_ref (fd=0x4, reason=0x33b8b87c0 <Address
> 0x33b8b87c0 out of bounds>, file=0x0, line=0) at
> src/core/lib/iomgr/ev_poll_posix.c:447
>
> #3  0x00007f55416a4691 in tcp_drop_uncovered_then_handle_write
> (exec_ctx=0x7f54f6bf8680, arg=0x7f4c4e2c9010, error=0x0) at
> src/core/lib/iomgr/tcp_posix.c:239
>
> #4  0x00007f55416c71a8 in grpc_server_request_registered_call
> (server=0x0, rmp=0x0, call=0x2482ad88, deadline=0x188924,
> initial_metadata=0x7f4c4e2c8f00, optional_payload=0x0,
> cq_bound_to_call=0x4, cq_for_notification=0x33b8b87c0,
>
>     tag=0x0) at src/core/lib/surface/server.c:1486
>
>
> #5  0x00007f55416c75d2 in conforms_to (slice=...,
> legal_bits=0x7f4c22ce2e50 "\270\064\245\214L\177", err_desc=0x7f54f6bf8701
> "\211\030") at src/core/lib/surface/validate_metadata.c:40
>
> #6  0x00007f5541b97105 in HPHP::c_Call_ni_startBatch
> (this_=0x7f4c22ce2e60, actions=...) at
> /home/ryangordon/hhvm/grpc/src/php/ext/grpc/hhvm/call.cpp:529
>
>
> The stack trace doesn't always look like that but it always is stuck on
> epoll_wait/epoll_pwait often from gpr_cv_wait.
>
>
> I've managed to get GPRC_TRACE=all GRPC_VERBOSITY=DEBUG of a good request
> and a request when it is in the bad state to compare side by side.
>
>
> Good Request:
> https://gist.githubusercontent.com/RyanGordon/831a8213e735bb1a4e5b93a854aab3c8/raw/7e37d595dbe56a77274bff83b64ab661a80e0c27/Not-Frozen%2520Log%2520Snippet
>
> Bad Request:
> https://gist.githubusercontent.com/RyanGordon/831a8213e735bb1a4e5b93a854aab3c8/raw/7e37d595dbe56a77274bff83b64ab661a80e0c27/Frozen%2520log%2520snippet
>
>
> The bad request looks like this when it's execution deviates:
>
>
> D0919 14:54:36.312349629   51030 combiner.c:163]             C:0x7f54faf35c00 
> grpc_combiner_execute c=0x7f4c22edbfb0 last=61995
> D0919 14:54:36.312353011   51030 exec_ctx.c:73]              closure 
> 0x7f4c22edbec0 finished
> I0919 14:54:36.312358626   51030 completion_queue.c:834]     
> grpc_completion_queue_next(cq=0x7f4c22e9e200, deadline=gpr_timespec { tv_sec: 
> 4, tv_nsec: 999000000, clock_type: 3 }, reserved=(nil))
>
> ...
>
>
> Whereas, the good request does this:
>
> D0915 15:19:50.178995777   13689 combiner.c:163]             C:0x7f932172f700 
> grpc_combiner_execute c=0x7f9313d67fb0 last=1
> D0915 15:19:50.179005880   13689 exec_ctx.c:73]              closure 
> 0x7f9313d67ec0 finished
> D0915 15:19:50.179074826   13689 combiner.c:225]             C:0x7f932172f700 
> grpc_combiner_continue_exec_ctx contended=0 exec_ctx_ready_to_finish=1 
> time_to_execute_final_list=0
> D0915 15:19:50.179077765   13689 combiner.c:243]             C:0x7f932172f700 
> maybe_finish_one n=0x7f9313d67fb0
>
> ... (and a whole lot more)
>
>
>
>
> Effectively it seems like the something is not getting scheduled when it 
> should be (grpc_combiner_continue_exec_ctx never gets run). I've been working 
> on this for a few weeks now and haven't been able to track down why or a fix. 
> I have the full logs if they might be helpful. I've been testing on my branch 
> that is kept pretty close to grpc/grpc master.
>
>
> Does anyone have any idea what is going on? I have the bad process open on a 
> test server in gdb and have timeouts implemented so I can run repeated 
> requests at it to continue debugging if necessary.
>
>
> Thanks,
>
> Ryan
>
> --
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/grpc-io/b3cc406f-9b93-4259-b8f9-df3eb904132a%40googlegroups.com
> <https://groups.google.com/d/msgid/grpc-io/b3cc406f-9b93-4259-b8f9-df3eb904132a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAPK2-4ddvug_q5XVCe4s_RJViYnmD9HYpxfoRRBWSp5y6rp9Yg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to