tvalentyn commented on issue #30867: URL: https://github.com/apache/beam/issues/30867#issuecomment-2048473295
TLDR is that a thread that executes bigtable/transports/grpc.create_channel() later calls into likely a python extension cygrpc.Channel(), which holds GIL indefinitely, so other threads cannot run, and we hence SDK is not responsive on /sdk_status RPC calls. https://cloud.google.com/dataflow/docs/guides/common-errors#worker-lost-contact also explains this failure mode ``` Traceback for thread 100 (python) [Has the GIL] (most recent call last): ... transport = self._create_gapic_client_channel( (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable/client.py", line 285, in _create_gapic_client_channel channel = grpc_transport.create_channel( (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable_v2/services/bigtable/transports/grpc.py", line 217, in create_channel return grpc_helpers.create_channel( (Python) File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 386, in create_channel return grpc.secure_channel( (Python) File "/usr/local/lib/python3.8/site-packages/grpc/__init__.py", line 2119, in secure_channel return _channel.Channel( (Python) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 2046, in __init__ self._channel = cygrpc.Channel( ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org