That problem seems to go away when I upgrade to protobuf 3.20 for some 
reason.

On Tuesday, March 8, 2022 at 9:56:17 AM UTC+10 Anh Tran wrote:

> It seems like there's some deadlock bug or undocumented expected 
> behaviour  in gRPC source code.
>
> However for the life of me, I can't make a minimal example to reproduce 
> the problem.
>
>
> On Monday, March 7, 2022 at 1:06:32 PM UTC+10 Plin Chen wrote:
>
>> Our Situation is that when we try to close App without shutting gRPC 
>> client down, the app hangs up there.
>> Of course, if we shutdown gRPC client manually several seconds before 
>> quiting the app, things will go as normal.
>> ----
>> We are using the latest gRPC-CSharp version. I notice that shutdown 
>> method is an aync method. 
>> When the "OnApplicationQuit" event is catched and we try to invoke 
>> shutdown(gRPC) before the app really quit,
>> gRPC ThreadPool would seem like blocking with the quitting process.
>> ----
>> We've tried 2 workarounds, and either works:
>> 1, catch OnApplicationQuit, invoke "shutdown" of gRPC, let the 
>> application wait for 2 seconds then try to close the app.
>> 2, Everytime when client finish calling remote procedure of grpc server, 
>> shutdown the grpc client at once. 
>> On Friday, March 4, 2022 at 9:18:19 PM UTC+8 Anh Tran wrote:
>>
>>> I have some time to revisit this problem today. Running with GRPC 
>>> VERBOSITY set to DEBUG gives me this (anything starts with* 
>>> [2022-03-04T23* is my own logging):
>>>
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] Logger level is set at info.
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [warning] [process 501457, 
>>> thread 501457] Persistent not specified and will default to false
>>> D0304 23:15:41.640009936  501457 ev_posix.cc:172]            Using 
>>> polling engine: epollex
>>> D0304 23:15:41.640079032  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "grpclb"
>>> D0304 23:15:41.640106084  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "priority_experimental"
>>> D0304 23:15:41.640124027  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "weighted_target_experimental"
>>> D0304 23:15:41.640127329  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "pick_first"
>>> D0304 23:15:41.640150128  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "round_robin"
>>> D0304 23:15:41.640153574  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "ring_hash_experimental"
>>> D0304 23:15:41.640162650  501457 dns_resolver_ares.cc:624]   Using ares 
>>> dns resolver
>>> D0304 23:15:41.640230087  501457 certificate_provider_registry.cc:33] 
>>> registering certificate provider factory for "file_watcher"
>>> D0304 23:15:41.640236782  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "cds_experimental"
>>> D0304 23:15:41.640241375  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "xds_cluster_impl_experimental"
>>> D0304 23:15:41.640258747  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "xds_cluster_resolver_experimental"
>>> D0304 23:15:41.640262652  501457 lb_policy_registry.cc:42]   registering 
>>> LB policy factory for "xds_cluster_manager_experimental"
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] gRPC client at localhost:4500 is registering with the 
>>> server...
>>> I0304 23:15:41.640909979  501458 subchannel.cc:945]          subchannel 
>>> 0x55e6713dcea0 {address=ipv6:[::1]:4500, 
>>> args=grpc.client_channel_factory=0x55e6713d6a20, 
>>> grpc.default_authority=localhost:4500, 
>>> grpc.internal.subchannel_pool=0x55e6713bb810, 
>>> grpc.primary_user_agent=grpc-c++/1.44.0-dev, 
>>> grpc.resource_quota=0x55e6713d7b40, grpc.server_uri=dns:///localhost:4500}: 
>>> connect failed: {"created":"@1646399741.640833261","description":"Failed to 
>>> connect to remote host: Connection 
>>> refused","errno":111,"file":"/home/hoover/workspace/DPAV/Linux/OpenSource/gRPC-focal/src/core/lib/iomgr/tcp_client_posix.cc","file_line":200,"os_error":"Connection
>>>  
>>> refused","syscall":"connect","target_address":"ipv6:[::1]:4500"}
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] ... gRPC client at localhost:4500 successfully registered 
>>> with server
>>> I0304 23:15:41.642209150  501462 server_builder.cc:327]      Synchronous 
>>> server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
>>> I0304 23:15:41.642437546  501462 socket_utils_common_posix.cc:353] 
>>> TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] JobClient client constructed.
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] HelloWorld client constructed.
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] New id assigned: 
>>> localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> is trying to register client for job type: : hello_world
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501463] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> registered with Job Server 4501
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501457] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> received ACK for RegisterNewClient.
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> identified JSON message as issueNewJob.
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> accepted job.
>>> [2022-03-04T23:15:41+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> acknowledged the job.
>>> [2022-03-04T23:15:42+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501485] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> identified JSON message as jobProgressUpdateAck.
>>> [2022-03-04T23:15:42+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> received ACK for final JobProgressUpdate.
>>> [2022-03-04T23:15:42+10:00] [workflow_client] [info] [process 501457, 
>>> thread 501484] Client localhost:4502:598ec6d9-8a75-4500-b34f-9cab3160c642 
>>> finished processing a job and now shutting down.
>>> D0304 23:15:42.482888232  501484 chttp2_transport.cc:1715]   ipv4:
>>> 127.0.0.1:56916: Sending goaway 
>>> err={"created":"@1646399742.482860742","description":"Server 
>>> shutdown","file":"/home/hoover/workspace/DPAV/Linux/OpenSource/gRPC-focal/src/core/lib/surface/server.cc","file_line":481,"grpc_status":0}
>>>
>>> On Monday, February 14, 2022 at 1:37:12 PM UTC+10 Anh Tran wrote:
>>>
>>>> Hi team,
>>>>
>>>> Yes. that is the case. I am also confused...
>>>>
>>>>
>>>> On Saturday, February 12, 2022 at 9:00:29 AM UTC+10 [email protected] 
>>>> wrote:
>>>>
>>>>> Hi Ahn,
>>>>>
>>>>> Thanks for the stack trace but I'm a bit confused. Is this what you 
>>>>> got while your program was stuck waiting forever? It looks a bit off 
>>>>> because Valgrind complains about "Conditional jump or move depends on 
>>>>> uninitialised value" and callstack appears to show some deadline inside.
>>>>>
>>>>> On Thursday, February 10, 2022 at 1:10:12 AM UTC-8 Anh Tran wrote:
>>>>>
>>>>>> Hi @google.com,,
>>>>>>
>>>>>> Thanks for getting back to me at this. Please see my stack trace from 
>>>>>> valgrind:
>>>>>>
>>>>>> ==287419== Memcheck, a memory error detector
>>>>>> ==287419== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward 
>>>>>> et al.
>>>>>> ==287419== Using Valgrind-3.18.1 and LibVEX; rerun with -h for 
>>>>>> copyright info
>>>>>> ==287419== Command: 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli 
>>>>>> --job-type 
>>>>>> slam
>>>>>> ==287419== Parent PID: 287418
>>>>>> ==287419== 
>>>>>> ==287419== Conditional jump or move depends on uninitialised value(s)
>>>>>> ==287419==    at 0x91743A: void** NextStackFrame<true, false>(void**, 
>>>>>> void const*, unsigned long, unsigned long) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x916F98: int UnwindImpl<false, false>(void**, int*, 
>>>>>> int, int, void const*, int*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x916D8D: absl::lts_20211102::GetStackTrace(void**, 
>>>>>> int, int) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90F27C: absl::lts_20211102::(anonymous 
>>>>>> namespace)::GetStack(void**, int) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x924569: 
>>>>>> absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId,
>>>>>>  
>>>>>> int, int (*)(void**, int)) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90F342: 
>>>>>> absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90F84C: 
>>>>>> absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) 
>>>>>> (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90FB6E: absl::lts_20211102::Mutex::Lock() (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x8A3AD0: gpr_mu_lock (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x7DFA25: begin_worker(grpc_pollset*, 
>>>>>> grpc_pollset_worker*, grpc_pollset_worker**, long) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x7DFFDA: pollset_work(grpc_pollset*, 
>>>>>> grpc_pollset_worker**, long) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x7716F4: pollset_work(grpc_pollset*, 
>>>>>> grpc_pollset_worker**, long) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==  Uninitialised value was created by a stack allocation
>>>>>> ==287419==    at 0x1C3FD0: 
>>>>>> Emesent::GRPCQueue::Stub::Register(grpc::ClientContext*, Emesent::Empty 
>>>>>> const&, Emesent::RegistrationResponse*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419== 
>>>>>> ==287419== Thread 7 grpcpp_sync_ser:
>>>>>> ==287419== Conditional jump or move depends on uninitialised value(s)
>>>>>> ==287419==    at 0x916F7F: int UnwindImpl<false, false>(void**, int*, 
>>>>>> int, int, void const*, int*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x916D8D: absl::lts_20211102::GetStackTrace(void**, 
>>>>>> int, int) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90F27C: absl::lts_20211102::(anonymous 
>>>>>> namespace)::GetStack(void**, int) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x924569: 
>>>>>> absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId,
>>>>>>  
>>>>>> int, int (*)(void**, int)) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90F342: 
>>>>>> absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90F84C: 
>>>>>> absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) 
>>>>>> (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x90FB6E: absl::lts_20211102::Mutex::Lock() (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x8A3AD0: gpr_mu_lock (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x5121A4: grpc_cq_begin_op(grpc_completion_queue*, 
>>>>>> void*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x5199C4: 
>>>>>> grpc_core::Server::ShutdownAndNotify(grpc_completion_queue*, void*) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x51CD63: grpc_server_shutdown_and_notify (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==    by 0x231AA2: 
>>>>>> grpc::Server::ShutdownInternal(gpr_timespec) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==  Uninitialised value was created by a stack allocation
>>>>>> ==287419==    at 0x1D37E0: std::_Function_handler<void 
>>>>>> (std::__cxx11::basic_string<char, std::char_traits<char>, 
>>>>>> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
>>>>>> std::char_traits<char>, std::allocator<char> > const&), 
>>>>>> workflow_client::GRPCIcomm::start_and_wait()::{lambda(std::__cxx11::basic_string<char,
>>>>>>  
>>>>>> std::char_traits<char>, std::allocator<char> > const&, 
>>>>>> std::__cxx11::basic_string<char, std::char_traits<char>, 
>>>>>> std::allocator<char> > const&)#1}>::_M_invoke(std::_Any_data const&, 
>>>>>> std::__cxx11::basic_string<char, std::char_traits<char>, 
>>>>>> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
>>>>>> std::char_traits<char>, std::allocator<char> > const&) (in 
>>>>>> /home/atran/codes/workflow_client/dist/workflow/job_client_cli)
>>>>>> ==287419==
>>>>>>
>>>>>> In case this is interesting, this is how I start my server and wait 
>>>>>> until shutdown:
>>>>>>
>>>>>> std::string server_addr_and_port = base_address_ + ":" + id_;
>>>>>> grpc::ServerBuilder server_builder;
>>>>>> server_builder.AddListeningPort(server_addr_and_port, grpc::
>>>>>> InsecureServerCredentials());
>>>>>>
>>>>>> GRPCQueueImpl grpc_queue_service;
>>>>>> server_builder.RegisterService(&grpc_queue_service);
>>>>>> server_ = server_builder.BuildAndStart();
>>>>>> server_->Wait();
>>>>>> On Thursday, February 10, 2022 at 9:08:39 AM UTC+10 
>>>>>> [email protected] wrote:
>>>>>>
>>>>>>> I have also experienced the same issue in my work environment. Our 
>>>>>>> work around was to have N second timeout and check a shutdown flag and 
>>>>>>> then 
>>>>>>> call shutdown.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Feb 9, 2022 at 3:54 PM, <[email protected]' via grpc.io> wrote:
>>>>>>>
>>>>>>> Would you capture the call-stack when Shutdown function appears to 
>>>>>>> hang?
>>>>>>>
>>>>>>> On Wednesday, February 2, 2022 at 10:24:14 PM UTC-8 Anh Tran wrote:
>>>>>>>
>>>>>>>> The last commit on my local copy of gRPC is 06 Jan 2022, so not 
>>>>>>>> that old.
>>>>>>>> On Thursday, February 3, 2022 at 4:51:16 AM UTC+10 Easwar 
>>>>>>>> Swaminathan wrote:
>>>>>>>>
>>>>>>>>> Has anyone of y'all seen this in a more recent version of gRPC. 
>>>>>>>>> `1.30.2` is really old.
>>>>>>>>>
>>>>>>>>> On Tuesday, February 1, 2022 at 12:15:54 AM UTC-8 Prasanta Behera 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> No, I've not found any solution yet! I was trying to have a 
>>>>>>>>>> hello-world kind of example to reproduce to open a ticket, but as 
>>>>>>>>>> you said 
>>>>>>>>>> with such simple examples it never reproduces...
>>>>>>>>>>
>>>>>>>>>> On Tuesday, February 1, 2022 at 1:42:43 PM UTC+5:30 Anh Tran 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I should also add that this behavior happens during threading. 
>>>>>>>>>>>
>>>>>>>>>>> On Tuesday, February 1, 2022 at 3:45:04 PM UTC+10 Anh Tran wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I came across the same problem. 
>>>>>>>>>>>> When trying to isolate the problem (having a simple hello world 
>>>>>>>>>>>> server-client), the problem seems to go away, i.e server can be 
>>>>>>>>>>>> shut down. 
>>>>>>>>>>>>
>>>>>>>>>>>> But when using in my own project, this happens. And despite my 
>>>>>>>>>>>> effort to track down the problem, I can't seem to spot one.
>>>>>>>>>>>> This is weird since this version of Shutdown should force shut 
>>>>>>>>>>>> down after a certain period, as describe in the doc.
>>>>>>>>>>>>
>>>>>>>>>>>> Prasanta, do you happen to find the solution to your problem? 
>>>>>>>>>>>> Maybe it will give a hint to mine.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thursday, January 20, 2022 at 10:02:43 PM UTC+10 Prasanta 
>>>>>>>>>>>> Behera wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> gRPC version : 1.30.2 (c++)
>>>>>>>>>>>>> $ gcc --version
>>>>>>>>>>>>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
>>>>>>>>>>>>>
>>>>>>>>>>>>> We use gRPC is synchronous mode and stop our gRPC services 
>>>>>>>>>>>>> with this code
>>>>>>>>>>>>>
>>>>>>>>>>>>> m_server->Shutdown(std::chrono::system_clock::now() + 
>>>>>>>>>>>>> std::chrono::seconds(1));
>>>>>>>>>>>>> m_server->Wait();
>>>>>>>>>>>>>
>>>>>>>>>>>>> On rare occasions (about once in 1000  tries) the Shutdown() call 
>>>>>>>>>>>>> does NOT return and the process just hangs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there anything I missing to have a reliable shutdown of the 
>>>>>>>>>>>>> server?
>>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "grpc.io" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/grpc-io/96350454-3e52-4068-b3e1-1fea25537098n%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/grpc-io/96350454-3e52-4068-b3e1-1fea25537098n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/4f13ee59-bef1-43f2-b6a4-ae30ba6cd7abn%40googlegroups.com.

Reply via email to