Our Situation is that when we try to close App without shutting gRPC client
down, the app hangs up there.
Of course, if we shutdown gRPC client manually several seconds before
quiting the app, things will go as normal.
----
We are using the latest gRPC-CSharp version. I notice that shutdown method
is an aync method.
When the "OnApplicationQuit" event is catched and we try to invoke
shutdown(gRPC) before the app really quit,
gRPC ThreadPool would seem like blocking with the quitting process.
----
We've tried 2 workarounds, and either works:
1, catch OnApplicationQuit, invoke "shutdown" of gRPC, let the application
wait for 2 seconds then try to close the app.
2, Everytime when client finish calling remote procedure of grpc server,
shutdown the grpc client at once.
On Friday, March 4, 2022 at 9:18:19 PM UTC+8 Anh Tran wrote:
> I have some time to revisit this problem today. Running with GRPC
> VERBOSITY set to DEBUG gives me this (anything starts with*
> [2022-03-04T23* is my own logging):
>
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] Logger level is set at info.
> [2022-03-04T23:15:41+10:00] [workflow_client] [warning] [process 501457,
> thread 501457] Persistent not specified and will default to false
> D0304 23:15:41.640009936 501457 ev_posix.cc:172] Using polling
> engine: epollex
> D0304 23:15:41.640079032 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "grpclb"
> D0304 23:15:41.640106084 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "priority_experimental"
> D0304 23:15:41.640124027 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "weighted_target_experimental"
> D0304 23:15:41.640127329 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "pick_first"
> D0304 23:15:41.640150128 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "round_robin"
> D0304 23:15:41.640153574 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "ring_hash_experimental"
> D0304 23:15:41.640162650 501457 dns_resolver_ares.cc:624] Using ares
> dns resolver
> D0304 23:15:41.640230087 501457 certificate_provider_registry.cc:33]
> registering certificate provider factory for "file_watcher"
> D0304 23:15:41.640236782 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "cds_experimental"
> D0304 23:15:41.640241375 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "xds_cluster_impl_experimental"
> D0304 23:15:41.640258747 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "xds_cluster_resolver_experimental"
> D0304 23:15:41.640262652 501457 lb_policy_registry.cc:42] registering
> LB policy factory for "xds_cluster_manager_experimental"
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] gRPC client at localhost:4500 is registering with the
> server...
> I0304 23:15:41.640909979 501458 subchannel.cc:945] subchannel
> 0x55e6713dcea0 {address=ipv6:[::1]:4500,
> args=grpc.client_channel_factory=0x55e6713d6a20,
> grpc.default_authority=localhost:4500,
> grpc.internal.subchannel_pool=0x55e6713bb810,
> grpc.primary_user_agent=grpc-c++/1.44.0-dev,
> grpc.resource_quota=0x55e6713d7b40, grpc.server_uri=dns:///localhost:4500}:
> connect failed: {"created":"@1646399741.640833261","description":"Failed to
> connect to remote host: Connection
> refused","errno":111,"file":"/home/hoover/workspace/DPAV/Linux/OpenSource/gRPC-focal/src/core/lib/iomgr/tcp_client_posix.cc","file_line":200,"os_error":"Connection
>
> refused","syscall":"connect","target_address":"ipv6:[::1]:4500"}
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] ... gRPC client at localhost:4500 successfully registered
> with server
> I0304 23:15:41.642209150 501462 server_builder.cc:327] Synchronous
> server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
> I0304 23:15:41.642437546 501462 socket_utils_common_posix.cc:353]
> TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] JobClient client constructed.
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] HelloWorld client constructed.
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] New id assigned:
> localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> is trying to register client for job type: : hello_world
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501463] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> registered with Job Server 4501
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501457] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> received ACK for RegisterNewClient.
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> identified JSON message as issueNewJob.
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> accepted job.
> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457,
> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> acknowledged the job.
> [2022-03-04T23:15:42+10:00] [workflow_client] [info] [process 501457,
> thread 501485] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> identified JSON message as jobProgressUpdateAck.
> [2022-03-04T23:15:42+10:00] [workflow_client] [info] [process 501457,
> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> received ACK for final JobProgressUpdate.
> [2022-03-04T23:15:42+10:00] [workflow_client] [info] [process 501457,
> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
> finished processing a job and now shutting down.
> D0304 23:15:42.482888232 501484 chttp2_transport.cc:1715] ipv4:
> 127.0.0.1:56916: Sending goaway
> err={"created":"@1646399742.482860742","description":"Server
> shutdown","file":"/home/hoover/workspace/DPAV/Linux/OpenSource/gRPC-focal/src/core/lib/surface/server.cc","file_line":481,"grpc_status":0}
>
> On Monday, February 14, 2022 at 1:37:12 PM UTC+10 Anh Tran wrote:
>
>> Hi team,
>>
>> Yes. that is the case. I am also confused...
>>
>>
>> On Saturday, February 12, 2022 at 9:00:29 AM UTC+10 [email protected]
>> wrote:
>>
>>> Hi Ahn,
>>>
>>> Thanks for the stack trace but I'm a bit confused. Is this what you got
>>> while your program was stuck waiting forever? It looks a bit off because
>>> Valgrind complains about "Conditional jump or move depends on uninitialised
>>> value" and callstack appears to show some deadline inside.
>>>
>>> On Thursday, February 10, 2022 at 1:10:12 AM UTC-8 Anh Tran wrote:
>>>
>>>> Hi @google.com,,
>>>>
>>>> Thanks for getting back to me at this. Please see my stack trace from
>>>> valgrind:
>>>>
>>>> ==287419== Memcheck, a memory error detector
>>>> ==287419== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et
>>>> al.
>>>> ==287419== Using Valgrind-3.18.1 and LibVEX; rerun with -h for
>>>> copyright info
>>>> ==287419== Command:
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli --job-type
>>>> slam
>>>> ==287419== Parent PID: 287418
>>>> ==287419==
>>>> ==287419== Conditional jump or move depends on uninitialised value(s)
>>>> ==287419== at 0x91743A: void** NextStackFrame<true, false>(void**,
>>>> void const*, unsigned long, unsigned long) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x916F98: int UnwindImpl<false, false>(void**, int*,
>>>> int, int, void const*, int*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x916D8D: absl::lts_20211102::GetStackTrace(void**,
>>>> int, int) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90F27C: absl::lts_20211102::(anonymous
>>>> namespace)::GetStack(void**, int) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x924569:
>>>> absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId,
>>>>
>>>> int, int (*)(void**, int)) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90F342:
>>>> absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90F84C:
>>>> absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90FB6E: absl::lts_20211102::Mutex::Lock() (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x8A3AD0: gpr_mu_lock (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x7DFA25: begin_worker(grpc_pollset*,
>>>> grpc_pollset_worker*, grpc_pollset_worker**, long) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x7DFFDA: pollset_work(grpc_pollset*,
>>>> grpc_pollset_worker**, long) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x7716F4: pollset_work(grpc_pollset*,
>>>> grpc_pollset_worker**, long) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== Uninitialised value was created by a stack allocation
>>>> ==287419== at 0x1C3FD0:
>>>> Emesent::GRPCQueue::Stub::Register(grpc::ClientContext*, Emesent::Empty
>>>> const&, Emesent::RegistrationResponse*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419==
>>>> ==287419== Thread 7 grpcpp_sync_ser:
>>>> ==287419== Conditional jump or move depends on uninitialised value(s)
>>>> ==287419== at 0x916F7F: int UnwindImpl<false, false>(void**, int*,
>>>> int, int, void const*, int*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x916D8D: absl::lts_20211102::GetStackTrace(void**,
>>>> int, int) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90F27C: absl::lts_20211102::(anonymous
>>>> namespace)::GetStack(void**, int) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x924569:
>>>> absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId,
>>>>
>>>> int, int (*)(void**, int)) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90F342:
>>>> absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90F84C:
>>>> absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x90FB6E: absl::lts_20211102::Mutex::Lock() (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x8A3AD0: gpr_mu_lock (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x5121A4: grpc_cq_begin_op(grpc_completion_queue*,
>>>> void*) (in /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x5199C4:
>>>> grpc_core::Server::ShutdownAndNotify(grpc_completion_queue*, void*) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x51CD63: grpc_server_shutdown_and_notify (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== by 0x231AA2: grpc::Server::ShutdownInternal(gpr_timespec)
>>>> (in /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419== Uninitialised value was created by a stack allocation
>>>> ==287419== at 0x1D37E0: std::_Function_handler<void
>>>> (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>> std::allocator<char> > const&, std::__cxx11::basic_string<char,
>>>> std::char_traits<char>, std::allocator<char> > const&),
>>>> workflow_client::GRPCIcomm::start_and_wait()::{lambda(std::__cxx11::basic_string<char,
>>>>
>>>> std::char_traits<char>, std::allocator<char> > const&,
>>>> std::__cxx11::basic_string<char, std::char_traits<char>,
>>>> std::allocator<char> > const&)#1}>::_M_invoke(std::_Any_data const&,
>>>> std::__cxx11::basic_string<char, std::char_traits<char>,
>>>> std::allocator<char> > const&, std::__cxx11::basic_string<char,
>>>> std::char_traits<char>, std::allocator<char> > const&) (in
>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>> ==287419==
>>>>
>>>> In case this is interesting, this is how I start my server and wait
>>>> until shutdown:
>>>>
>>>> std::string server_addr_and_port = base_address_ + ":" + id_;
>>>> grpc::ServerBuilder server_builder;
>>>> server_builder.AddListeningPort(server_addr_and_port, grpc::
>>>> InsecureServerCredentials());
>>>>
>>>> GRPCQueueImpl grpc_queue_service;
>>>> server_builder.RegisterService(&grpc_queue_service);
>>>> server_ = server_builder.BuildAndStart();
>>>> server_->Wait();
>>>> On Thursday, February 10, 2022 at 9:08:39 AM UTC+10 [email protected]
>>>> wrote:
>>>>
>>>>> I have also experienced the same issue in my work environment. Our
>>>>> work around was to have N second timeout and check a shutdown flag and
>>>>> then
>>>>> call shutdown.
>>>>>
>>>>>
>>>>>
>>>>> On Feb 9, 2022 at 3:54 PM, <[email protected]' via grpc.io> wrote:
>>>>>
>>>>> Would you capture the call-stack when Shutdown function appears to
>>>>> hang?
>>>>>
>>>>> On Wednesday, February 2, 2022 at 10:24:14 PM UTC-8 Anh Tran wrote:
>>>>>
>>>>>> The last commit on my local copy of gRPC is 06 Jan 2022, so not that
>>>>>> old.
>>>>>> On Thursday, February 3, 2022 at 4:51:16 AM UTC+10 Easwar Swaminathan
>>>>>> wrote:
>>>>>>
>>>>>>> Has anyone of y'all seen this in a more recent version of gRPC.
>>>>>>> `1.30.2` is really old.
>>>>>>>
>>>>>>> On Tuesday, February 1, 2022 at 12:15:54 AM UTC-8 Prasanta Behera
>>>>>>> wrote:
>>>>>>>
>>>>>>>> No, I've not found any solution yet! I was trying to have a
>>>>>>>> hello-world kind of example to reproduce to open a ticket, but as you
>>>>>>>> said
>>>>>>>> with such simple examples it never reproduces...
>>>>>>>>
>>>>>>>> On Tuesday, February 1, 2022 at 1:42:43 PM UTC+5:30 Anh Tran wrote:
>>>>>>>>
>>>>>>>>> I should also add that this behavior happens during threading.
>>>>>>>>>
>>>>>>>>> On Tuesday, February 1, 2022 at 3:45:04 PM UTC+10 Anh Tran wrote:
>>>>>>>>>
>>>>>>>>>> Unfortunately I came across the same problem.
>>>>>>>>>> When trying to isolate the problem (having a simple hello world
>>>>>>>>>> server-client), the problem seems to go away, i.e server can be shut
>>>>>>>>>> down.
>>>>>>>>>>
>>>>>>>>>> But when using in my own project, this happens. And despite my
>>>>>>>>>> effort to track down the problem, I can't seem to spot one.
>>>>>>>>>> This is weird since this version of Shutdown should force shut
>>>>>>>>>> down after a certain period, as describe in the doc.
>>>>>>>>>>
>>>>>>>>>> Prasanta, do you happen to find the solution to your problem?
>>>>>>>>>> Maybe it will give a hint to mine.
>>>>>>>>>>
>>>>>>>>>> On Thursday, January 20, 2022 at 10:02:43 PM UTC+10 Prasanta
>>>>>>>>>> Behera wrote:
>>>>>>>>>>
>>>>>>>>>>> gRPC version : 1.30.2 (c++)
>>>>>>>>>>> $ gcc --version
>>>>>>>>>>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
>>>>>>>>>>>
>>>>>>>>>>> We use gRPC is synchronous mode and stop our gRPC services with
>>>>>>>>>>> this code
>>>>>>>>>>>
>>>>>>>>>>> m_server->Shutdown(std::chrono::system_clock::now() +
>>>>>>>>>>> std::chrono::seconds(1));
>>>>>>>>>>> m_server->Wait();
>>>>>>>>>>>
>>>>>>>>>>> On rare occasions (about once in 1000 tries) the Shutdown() call
>>>>>>>>>>> does NOT return and the process just hangs.
>>>>>>>>>>>
>>>>>>>>>>> Is there anything I missing to have a reliable shutdown of the
>>>>>>>>>>> server?
>>>>>>>>>>>
>>>>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "grpc.io" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/grpc-io/96350454-3e52-4068-b3e1-1fea25537098n%40googlegroups.com
>>>>>
>>>>> <https://groups.google.com/d/msgid/grpc-io/96350454-3e52-4068-b3e1-1fea25537098n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>>
--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/grpc-io/ca34d6b1-1cb4-487f-b1d6-2699c59bb320n%40googlegroups.com.