It sounds similar to the bug reported here?

https://github.com/grpc/grpc/issues/16427


On Mon, Oct 22, 2018 at 5:49 PM Abhishek Parmar <[email protected]>
wrote:

> Hi,
> On one of our servers we have a few grpc processes that talk to each
> other. Some processes are servers only (in C++) and others are both clients
> and servers(noodejs). The nodejs process are getting stuck after we reboot
> the server (the various processes get started about the same time by the
> init system). But if we restart/kill the processes they work fine.
>
> Here is a stack trace and debug log. Please let me know if this looks
> familiar, otherwise I will keep digging.
>
> Thanks
> Abhishek
>
> Backtrace from stuck nodejs process
>
> #0  0x00007fc359f8626d in __lll_lock_wait () from
> /lib/x86_64-linux-gnu/libpthread.so.0
>
> #1  0x00007fc359f7fdbd in pthread_mutex_lock () from
> /lib/x86_64-linux-gnu/libpthread.so.0
>
> #2  0x00007fc3445ad319 in gpr_mu_lock (mu=<optimized out>) at
> ../deps/grpc/src/core/lib/gpr/sync_posix.cc:47
>
> #3  0x00007fc34453747f in connected (arg=0x2758090, error=0x28d70a0) at
> ../deps/grpc/src/core/ext/transport/chttp2/client/chttp2_connector.cc:172
>
> #4  0x00007fc3445132c1 in exec_ctx_run (error=0x28d70a0,
> closure=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.cc:40
>
> #5  grpc_core::ExecCtx::Flush (this=this@entry=0x7fffe53398e0) at
> ../deps/grpc/src/core/lib/iomgr/exec_ctx.cc:128
>
> #6  0x00007fc34459fffe in ~ExecCtx (this=0x7fffe53398e0,
> __in_chrg=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.h:108
>
> #7  custom_connect_callback (socket=<optimized out>, error=0x28d70a0) at
> ../deps/grpc/src/core/lib/iomgr/tcp_client_custom.cc:100
>
> #8  0x00007fc34453716a in chttp2_connector_connect (con=0x2758090,
> args=0x7fffe5339a50, result=0x299cde0, notify=0x299cdf0) at
> ../deps/grpc/src/core/ext/transport/chttp2/client/chttp2_connector.cc:215
>
> #9  0x00007fc3445457dc in continue_connect_locked (c=0x299cdb0) at
> ../deps/grpc/src/core/ext/filters/client_channel/subchannel.cc:408
>
> #10 maybe_start_connecting_locked (c=0x299cdb0) at
> ../deps/grpc/src/core/ext/filters/client_channel/subchannel.cc:483
>
> #11 0x00007fc344546415 in grpc_subchannel_notify_on_state_change
> (c=0x299cdb0, interested_parties=0xdeafbeef, state=0x2800bbc,
> notify=<optimized out>) at
> ../deps/grpc/src/core/ext/filters/client_channel/subchannel.cc:532
>
> #12 0x00007fc344590e07 in grpc_core::SubchannelData<grpc_core::(anonymous
> namespace)::PickFirst::PickFirstSubchannelList, grpc_core::(anonymous
> namespace)::PickFirst::PickFirstSubchannelData>::StartConnectivityWatchLocked
> (
>
>     this=<optimized out>) at
> ../deps/grpc/src/core/ext/filters/client_channel/lb_policy/subchannel_list.h:304
>
> #13 0x00007fc344591c70 in StartPickingLocked (this=0x2800770) at
> ../deps/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:249
>
> #14 grpc_core::(anonymous namespace)::PickFirst::PickLocked
> (this=0x2800770, pick=0x28052a0) at
> ../deps/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:270
>
> #15 0x00007fc344540e2a in StartLocked (elem=<optimized out>) at
> ../deps/grpc/src/core/ext/filters/client_channel/client_channel.cc:2687
>
> #16 process_service_config_and_start_lb_pick_locked (elem=0x2805170) at
> ../deps/grpc/src/core/ext/filters/client_channel/client_channel.cc:2815
>
> #17 grpc_core::ResolverResultWaiter::DoneLocked (arg=0x28d26c0, error=0x0)
> at ../deps/grpc/src/core/ext/filters/client_channel/client_channel.cc:2920
>
> #18 0x00007fc344559332 in grpc_combiner_continue_exec_ctx () at
> ../deps/grpc/src/core/lib/iomgr/combiner.cc:260
>
> #19 0x00007fc3445132e4 in grpc_core::ExecCtx::Flush 
> (this=this@entry=0x7fffe5339c10)
> at ../deps/grpc/src/core/lib/iomgr/exec_ctx.cc:131
>
> #20 0x00007fc34455a64a in ~ExecCtx (this=0x7fffe5339c10,
> __in_chrg=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.h:108
>
> #21 grpc_custom_resolve_callback (r=0x27f3b40, result=0x2730ac0,
> error=<optimized out>) at
> ../deps/grpc/src/core/lib/iomgr/resolve_address_custom.cc:85
>
> #22 0x000000000143c395 in ?? ()
>
> #23 0x000000000143e44b in ?? ()
>
> #24 0x000000000144ffa8 in ?? ()
>
> #25 0x000000000143edd6 in uv_run ()
>
> #26 0x000000000122c1bd in node::Start(uv_loop_s*, int, char const* const*,
> int, char const* const*) ()
>
> #27 0x0000000001224d03 in node::Start(int, char**) ()
>
> #28 0x00007fc359bcc830 in __libc_start_main () from
> /lib/x86_64-linux-gnu/libc.so.6
>
> #29 0x00000000008aee41 in _start ()
>
>
> verbose grpc log
>
>
> 2018-10-22T16:15:39,402380574-0700 E: D1022 16:15:39.401908859    2103
> dns_resolver.cc:339]        Using native dns resolver
>
> 2018-10-22T16:15:39,402862016-0700 E: I1022 16:15:39.401936640    2103
> timer_manager.cc:85]        Spawn timer thread
>
> 2018-10-22T16:15:39,403382056-0700 E: I1022 16:15:39.401955746    2103
> init.cc:153]                grpc_init(void)
>
> 2018-10-22T16:15:39,403875574-0700 E: I1022 16:15:39.401960577    2198
> timer_manager.cc:235]       timers not checked: expect another thread to
>
> 2018-10-22T16:15:39,404425511-0700 E: I1022 16:15:39.401969527    2198
> timer_manager.cc:183]       sleep until kicked
>
> 2018-10-22T16:15:39,404906467-0700 E: I1022 16:15:39.402227144    2103
> completion_queue.cc:431]
> grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
>
> 2018-10-22T16:15:40,582796872-0700 E: I1022 16:15:40.582192594    2103
> server.cc:944]              grpc_server_create(0x2e29c70, (nil))
>
> 2018-10-22T16:15:40,583379934-0700 E: I1022 16:15:40.582212478    2103
> server.cc:930]
> grpc_server_register_completion_queue(server=0x2ddd2b0, cq=0x2de06b0,
> reserved=(nil))
>
> 2018-10-22T16:15:40,583888435-0700 E: Method handler requestAction for
> /c3d.AdminServiceInterface/RequestAction expected but not provided
>
> 2018-10-22T16:15:40,584442345-0700 E: I1022 16:15:40.582867963    2103
> server_chttp2.cc:34]
> grpc_server_add_insecure_http2_port(server=0x2ddd2b0, addr=0.0.0.0:1111)
>
> 2018-10-22T16:15:40,584957465-0700 E: I1022 16:15:40.582899401    2103
> tcp_server_custom.cc:380]   SERVER 0x2dbb660 add_port [::]:1111 error="No
> Error"
>
> 2018-10-22T16:15:40,590308691-0700 E: I1022 16:15:40.589690840    2103
> channel_create.cc:93]
>  grpc_insecure_channel_create(target=myteddy:1100, args=0x2eed230,
> reserved=(nil))
>
> 2018-10-22T16:15:40,591341636-0700 E: I1022 16:15:40.590835146    2103
> channel.cc:267]             grpc_channel_get_target(channel=0x2e29600)
>
> 2018-10-22T16:15:40,591929944-0700 E: I1022 16:15:40.591224466    2103
> metadata_array.cc:29]       grpc_metadata_array_init(array=0x2d3a048)
>
> 2018-10-22T16:15:40,592455185-0700 E: I1022 16:15:40.591235015    2103
> metadata_array.cc:29]       grpc_metadata_array_init(array=0x2d7fd48)
>
> 2018-10-22T16:15:40,592997632-0700 E: I1022 16:15:40.591240219    2103
> call.cc:1983]               grpc_call_start_batch(call=0x2ed3770,
> ops=0x2ea14d0, nops=2, tag=0x2d388f0, reserved=(nil))
>
> 2018-10-22T16:15:40,593465266-0700 E: I1022 16:15:40.591244610    2103
> call.cc:1593]               ops[0]: SEND_INITIAL_METADATA(nil)
>
> 2018-10-22T16:15:40,593928046-0700 E: I1022 16:15:40.591247538    2103
> call.cc:1593]               ops[1]: RECV_INITIAL_METADATA ptr=0x2d7fd48
>
> 2018-10-22T16:15:40,594908524-0700 E: I1022 16:15:40.591251816    2103
> call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780]
> closure=0x2ed4690 [executing batch] error="No Error"
>
> 2018-10-22T16:15:40,595398831-0700 E: I1022 16:15:40.591254354    2103
> call_combiner.cc:76]          size: 0 -> 1
>
> 2018-10-22T16:15:40,595937480-0700 E: I1022 16:15:40.591256477    2103
> call_combiner.cc:85]          EXECUTING IMMEDIATELY
>
> 2018-10-22T16:15:40,596469538-0700 E: I1022 16:15:40.591263238    2103
> call.cc:638]                OP[client-channel:0x2ed4110]:
> SEND_INITIAL_METADATA{key=3a 70 61 74 68 ':path' value=2f 63 33 64 2e 7a 69
> 6e 63 2e 5a 69 6e 63 2f 41 63 74 69 6f 6e 53 74 72 65 61 6d
> '/c3d.zinc.Zinc/ActionStream'} RECV_INITIAL_METADATA
>
> 2018-10-22T16:15:40,597051866-0700 E: I1022 16:15:40.591266381    2103
> client_channel.cc:1116]     chand=0x2e296c0 calld=0x2ed4130: adding pending
> batch at index 0
>
> 2018-10-22T16:15:40,597583780-0700 E: I1022 16:15:40.591268938    2103
> client_channel.cc:3064]     chand=0x2e296c0 calld=0x2ed4130: entering
> client_channel combiner
>
> 2018-10-22T16:15:40,598104257-0700 E: I1022 16:15:40.591274987    2103
> client_channel.cc:218]      chand=0x2e296c0: starting name resolution
>
> 2018-10-22T16:15:40,599053615-0700 E: D1022 16:15:40.591277545    2103
> dns_resolver.cc:280]        Start resolving.
>
> 2018-10-22T16:15:40,599580345-0700 E: I1022 16:15:40.591284535    2103
> client_channel.cc:2828]     chand=0x2e296c0 calld=0x2ed4130: deferring pick
> pending resolver result
>
> 2018-10-22T16:15:40,600090104-0700 E: I1022 16:15:40.591287019    2103
> call_combiner.cc:165]       call_combiner=0x2ed3780: setting
> notify_on_cancel=0x2de2850
>
> 2018-10-22T16:15:40,600662796-0700 E: I1022 16:15:40.591299744    2103
> metadata_array.cc:29]       grpc_metadata_array_init(array=0x2da7898)
>
> 2018-10-22T16:15:40,601247786-0700 E: I1022 16:15:40.591303804    2103
> call.cc:1983]               grpc_call_start_batch(call=0x2ed3770,
> ops=0x2eed050, nops=1, tag=0x2d23af0, reserved=(nil))
>
> 2018-10-22T16:15:40,601765545-0700 E: I1022 16:15:40.591307696    2103
> call.cc:1593]               ops[0]: RECV_STATUS_ON_CLIENT
> metadata=0x2da7898 status=0x2da78b0 details=0x2da78b8
>
> 2018-10-22T16:15:40,602248272-0700 E: I1022 16:15:40.591310486    2103
> call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780]
> closure=0x2ed4790 [executing batch] error="No Error"
>
> 2018-10-22T16:15:40,602770292-0700 E: I1022 16:15:40.591312823    2103
> call_combiner.cc:76]          size: 1 -> 2
>
> 2018-10-22T16:15:40,603249135-0700 E: I1022 16:15:40.591315028    2103
> call_combiner.cc:91]          QUEUING
>
> 2018-10-22T16:15:40,603754515-0700 E: I1022 16:15:40.596813629    2103
> server.cc:1024]             grpc_server_start(server=0x2ddd2b0)
>
> 2018-10-22T16:15:40,604252793-0700 E: I1022 16:15:40.596825047    2103
> executor.cc:200]            EXECUTOR schedule 0x2ed5650 inline
>
> 2018-10-22T16:15:40,604735654-0700 E: I1022 16:15:40.596829255    2103
> tcp_server_custom.cc:422]   SERVER_START 0x2dbb660
>
> 2018-10-22T16:15:40,605185438-0700 E: I1022 16:15:40.596835319    2103
> call_details.cc:31]         grpc_call_details_init(cd=0x2dabdf0)
>
> 2018-10-22T16:15:40,605648237-0700 E: I1022 16:15:40.596838124    2103
> metadata_array.cc:29]       grpc_metadata_array_init(array=0x2dabe50)
>
> 2018-10-22T16:15:40,606157918-0700 E: I1022 16:15:40.596843515    2103
> server.cc:1352]             grpc_server_request_call(server=0x2ddd2b0,
> call=0x2dabde8, details=0x2dabdf0, initial_metadata=0x2dabe50,
> cq_bound_to_call=0x2de06b0, cq_for_notification=0x2de06b0, tag=0x2dbd1d0)
>
> 2018-10-22T16:15:40,639462920-0700 E: Mon, 22 Oct 2018 23:15:40 GMT
> sequelize deprecated String based operators are now deprecated. Please use
> Symbol based operators for better security, read more at
> http://docs.sequelizejs.com/manual/tutorial/querying.html#operators at
> ../../node_modules/sequelize/lib/sequelize.js:242:13
>
> 2018-10-22T16:15:40,644245283-0700 E: (node:2103) DeprecationWarning:
> grpc.load: Use the @grpc/proto-loader module with
> grpc.loadPackageDefinition instead
>
> 2018-10-22T16:15:40,644791789-0700 E: (node:2103) DeprecationWarning:
> Server#addProtoService: Use Server#addService instead
>
> 2018-10-22T16:15:40,645336094-0700 E: I1022 16:15:40.643930321    2103
> call.cc:1983]               grpc_call_start_batch(call=0x2ed3770,
> ops=0x2ea14d0, nops=1, tag=0x2da4260, reserved=(nil))
>
> 2018-10-22T16:15:40,645932732-0700 E: I1022 16:15:40.643939646    2103
> call.cc:1593]               ops[0]: RECV_MESSAGE ptr=0x2da4298
>
> 2018-10-22T16:15:40,646470991-0700 E: I1022 16:15:40.643952248    2103
> call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780]
> closure=0x2ed4890 [executing batch] error="No Error"
>
> 2018-10-22T16:15:40,647038084-0700 E: I1022 16:15:40.643954807    2103
> call_combiner.cc:76]          size: 2 -> 3
>
> 2018-10-22T16:15:40,647569655-0700 E: I1022 16:15:40.643956732    2103
> call_combiner.cc:91]          QUEUING
>
> 2018-10-22T16:15:40,648086615-0700 E: I1022 16:15:40.647123516    2103
> call.cc:1983]               grpc_call_start_batch(call=0x2ed3770,
> ops=0x2ea4370, nops=1, tag=0x2ee2140, reserved=(nil))
>
> 2018-10-22T16:15:40,648577680-0700 E: I1022 16:15:40.647131625    2103
> call.cc:1593]               ops[0]: SEND_MESSAGE ptr=0x2d6bee0
>
> 2018-10-22T16:15:40,649081272-0700 E: I1022 16:15:40.647135137    2103
> call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780]
> closure=0x2ed4990 [executing batch] error="No Error"
>
> 2018-10-22T16:15:40,649534561-0700 E: I1022 16:15:40.647136958    2103
> call_combiner.cc:76]          size: 3 -> 4
>
> 2018-10-22T16:15:40,649982921-0700 E: I1022 16:15:40.647138389    2103
> call_combiner.cc:91]          QUEUING
>
> 2018-10-22T16:15:40,650453200-0700 E: I1022 16:15:40.647217372    2103
> completion_queue.cc:851]    grpc_completion_queue_next(cq=0x2de06b0,
> deadline=gpr_timespec { tv_sec: -9223372036854775808, tv_nsec: 0,
> clock_type: 0 }, reserved=(nil))
>
> 2018-10-22T16:15:40,650934867-0700 E: I1022 16:15:40.647225719    2103
> completion_queue.cc:951]    RETURN_EVENT[0x2de06b0]: QUEUE_TIMEOUT
>
> 2018-10-22T16:15:40,651414350-0700 E: I1022 16:15:40.647566577    2103
> client_channel.cc:485]      chand=0x2e296c0: got resolver result:
> resolver_result=0x2d80140 error="No Error"
>
> 2018-10-22T16:15:40,651904090-0700 E: I1022 16:15:40.647574780    2103
> pick_first.cc:154]          Pick First 0x2db0d10 created.
>
> 2018-10-22T16:15:40,652397957-0700 E: I1022 16:15:40.647578030    2103
> pick_first.cc:382]          Pick First 0x2db0d10 received update with 1
> addresses
>
> 2018-10-22T16:15:40,652877632-0700 E: I1022 16:15:40.647580977    2103
> subchannel_list.h:458]      [pick_first 0x2db0d10] Creating subchannel list
> 0x2ea39f0 for 1 subchannels
>
> 2018-10-22T16:15:40,653356788-0700 E: I1022 16:15:40.647619797    2103
> subchannel_list.h:498]      [pick_first 0x2db0d10] subchannel list
> 0x2ea39f0 index 0: Created subchannel 0x2d86ed0 for address uri ipv4:
> 10.12.61.201:1100
>
> 2018-10-22T16:15:40,653804302-0700 E: I1022 16:15:40.647624418    2103
> client_channel.cc:398]      chand=0x2e296c0: created new LB policy
> "pick_first" (0x2db0d10)
>
> 2018-10-22T16:15:40,654245777-0700 E: I1022 16:15:40.647627929    2103
> connectivity_state.cc:92]   CONWATCH: 0x2db0ef8 pick_first: get IDLE
>
> 2018-10-22T16:15:40,654759956-0700 E: I1022 16:15:40.647630687    2103
> connectivity_state.cc:116]  CONWATCH: 0x2db0ef8 pick_first: from IDLE
> [cur=IDLE] notify=0x2d871a8
>
> 2018-10-22T16:15:40,655252548-0700 E: I1022 16:15:40.647634030    2103
> client_channel.cc:177]      chand=0x2e296c0: setting connectivity state to
> IDLE
>
> 2018-10-22T16:15:40,655740066-0700 E: I1022 16:15:40.647636773    2103
> connectivity_state.cc:164]  SET: 0x2e29748 client_channel: IDLE --> IDLE
> [resolver_result] error=(nil) "No Error"
>
> 2018-10-22T16:15:40,656232847-0700 E: I1022 16:15:40.647639358    2103
> client_channel.cc:2917]     chand=0x2e296c0 calld=0x2ed4130: resolver
> returned, doing LB pick
>
> 2018-10-22T16:15:40,656696185-0700 E: I1022 16:15:40.647641353    2103
> client_channel.cc:2756]     chand=0x2e296c0 calld=0x2ed4130: applying
> service config to call
>
> 2018-10-22T16:15:40,657147226-0700 E: I1022 16:15:40.647642836    2103
> client_channel.cc:2665]     chand=0x2e296c0 calld=0x2ed4130: starting pick
> on lb_policy=0x2db0d10
>
> 2018-10-22T16:15:40,657601911-0700 E: I1022 16:15:40.647645073    2103
> subchannel_list.h:292]      [pick_first 0x2db0d10] subchannel list
> 0x2ea39f0 index 0 of 1 (subchannel 0x2d86ed0): starting watch: requesting
> connectivity change notification (from IDLE)
>
> 2018-10-22T16:15:40,658032165-0700 E: I1022 16:15:40.647646950    2103
> connectivity_state.cc:116]  CONWATCH: 0x2d86fa0 subchannel: from IDLE
> [cur=IDLE] notify=0x2d38578
>
> 2018-10-22T16:15:40,658462296-0700 E: I1022 16:15:40.647649274    2103
> connectivity_state.cc:164]  SET: 0x2d86fa0 subchannel: IDLE --> CONNECTING
> [state_change] error=(nil) "No Error"
>
> 2018-10-22T16:15:40,658876797-0700 E: I1022 16:15:40.647650773    2103
> connectivity_state.cc:190]  NOTIFY: 0x2d86fa0 subchannel: 0x2d38578
>
> 2018-10-22T16:15:40,659315104-0700 E: I1022 16:15:40.647657057    2103
> tcp_client_custom.cc:139]   CLIENT_CONNECT: 0x2da77c0 ipv4:
> 10.12.61.201:1100: asynchronously connecting
>
> 2018-10-22T16:15:40,659745439-0700 E: I1022 16:15:40.647676597    2103
> tcp_client_custom.cc:69]    CLIENT_CONNECT: ipv4:10.12.61.201:1100:
> on_alarm: error="Cancelled"
>
>
>
> --
-Abhishek

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAD2hskxrogb0QfPjc27fVxd6S%2B16DV8H0pri-MXjDpK1u%2BnNew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to