[
https://issues.apache.org/jira/browse/BEAM-12603?focusedWorklogId=766347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-766347
]
ASF GitHub Bot logged work on BEAM-12603:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/May/22 23:54
Start Date: 04/May/22 23:54
Worklog Time Spent: 10m
Work Description: y1chi commented on PR #17537:
URL: https://github.com/apache/beam/pull/17537#issuecomment-1118034282
> Thanks! I think this workaround is preferable to retrying the entire test.
A couple of questions/concerns:
>
> * Are we sure it's safe to retry these UNAVAILABLE responses? The [gRPC
[docs](https://grpc.github.io/grpc/core/md_doc_statuscodes.html) note "it is
not always safe to retry non-idempotent operations."
> * I think this is a better workaround, but it's still a little concerning
that we don't understand the root cause - any thoughts on how we could dig
deeper?
I believe UNAVAILABLE in this cases means that the underneath tcp connection
was broken (not sure why and unclear how to debug that) before the request is
handled so that means it should be retriable(had more than 250 runs and didn't
see any side effect like getting wrong results after adding retry). I enabled
the GRPC debug log but didn't find anything interesting also.
Issue Time Tracking
-------------------
Worklog Id: (was: 766347)
Time Spent: 3.5h (was: 3h 20m)
> Flaky:
> apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
> ----------------------------------------------------------------------------------------------------------
>
> Key: BEAM-12603
> URL: https://issues.apache.org/jira/browse/BEAM-12603
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core, test-failures
> Reporter: Tyson Hamilton
> Priority: P2
> Labels: flake
> Fix For: Not applicable
>
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> The
> `apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers`
> tests are flaky and causing precommit failures that seem similar.
> `test_pardo_windowed_side_inputs` :
> [https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/4417/console]
> {code:java}
> 23:32:35 Exception in thread read_grpc_client_inputs:
> 23:32:35 Traceback (most recent call last):
> 23:32:35 File "/usr/lib/python3.8/threading.py", line 932, in
> _bootstrap_inner
> 23:32:35 self.run()
> 23:32:35 File "/usr/lib/python3.8/threading.py", line 870, in run
> 23:32:35 self._target(*self._args, **self._kwargs)
> 23:32:35 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/apache_beam/runners/worker/data_plane.py",
> line 587, in <lambda>
> 23:32:35 target=lambda: self._read_inputs(elements_iterator),
> 23:32:35 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/apache_beam/runners/worker/data_plane.py",
> line 570, in _read_inputs
> 23:32:35 for elements in elements_iterator:
> 23:32:35 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-cython/py38-cython/lib/python3.8/site-packages/grpc/_channel.py",
> line 426, in __next__
> 23:32:35 return self._next()
> 23:32:35 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-cython/py38-cython/lib/python3.8/site-packages/grpc/_channel.py",
> line 826, in _next
> 23:32:35 raise self
> 23:32:35 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of
> RPC that terminated with:
> 23:32:35 status = StatusCode.UNAVAILABLE
> 23:32:35 details = "Broken pipe"
> 23:32:35 debug_error_string =
> "{"created":"@1626071403.252458842","description":"Error received from peer
> ipv4:127.0.0.1:37459","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"Broken
> pipe","grpc_status":14}"
> 23:32:35 >
> 23:32:35 Traceback (most recent call last):
> 23:32:35 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/apache_beam/runners/worker/data_plane.py",
> line 459, in input_elements
> 23:32:35 element = received.get(timeout=1)
> 23:32:35 File "/usr/lib/python3.8/queue.py", line 178, in get
> 23:32:35 raise Empty
> 23:32:35 _queue.Empty
> {code}
>
> `test_pack_combiners` :
> [https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/4415/consoleFull]
> {code:java}
> 11:41:14 Exception ignored in: <object repr() failed>
> 11:41:14 Traceback (most recent call last):
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/target/.tox-py36-cython/py36-cython/lib/python3.6/site-packages/grpc/_channel.py",
> line 444, in __del__
> 11:41:14 with self._state.condition:
> 11:41:14 AttributeError: '_MultiThreadedRendezvous' object has no attribute
> '_state'
> 11:41:14 Traceback (most recent call last):
> 11:41:14 File "apache_beam/runners/common.py", line 1223, in
> apache_beam.runners.common.DoFnRunner.process
> 11:41:14 return self.do_fn_invoker.invoke_process(windowed_value)
> 11:41:14 File "apache_beam/runners/common.py", line 752, in
> apache_beam.runners.common.PerWindowInvoker.invoke_process
> 11:41:14 self._invoke_process_per_window(
> 11:41:14 File "apache_beam/runners/common.py", line 816, in
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
> 11:41:14 [si[global_window] for si in self.side_inputs]))
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
> line 427, in __getitem__
> 11:41:14 self._cache[target_window] =
> self._side_input_data.view_fn(raw_view)
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
> line 353, in __iter__
> 11:41:14 self._state_handler.blocking_get(self._state_key,
> self._coder_impl))
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
> line 1184, in blocking_get
> 11:41:14 self._partially_cached_iterable(state_key, coder))
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
> line 1290, in _partially_cached_iterable
> 11:41:14 data, continuation_token = self._underlying.get_raw(state_key,
> None)
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
> line 1057, in get_raw
> 11:41:14 continuation_token=continuation_token)))
> 11:41:14 File
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
> line 1095, in _blocking_request
> 11:41:14 raise t(v).with_traceback(tb)
> 11:41:14 TypeError: __init__() missing 3 required positional arguments:
> 'call', 'response_deserializer', and 'deadline'
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)