[
https://issues.apache.org/jira/browse/BEAM-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884855#comment-16884855
]
Valentyn Tymofieiev edited comment on BEAM-6202 at 7/15/19 5:07 AM:
--------------------------------------------------------------------
Another example: runner.dataflow_client.get_job(job_id) fails with 404, which
causes Dataflow runner to fail:
{noformat}
16:57:22 response = runner.dataflow_client.get_job(job_id)
16:57:22 Found:
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-07-14_16_41_46-17942851702589150224?project=apache-beam-testing.
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
line 197, in wrapper
16:57:22 return fun(*args, **kwargs)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
line 663, in get_job
16:57:22 response = self._client.projects_locations_jobs.Get(request)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py",
line 689, in Get
16:57:22 config, request, global_params=global_params)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 731, in _RunMethod
16:57:22 return self.ProcessHttpResponse(method_config, http_response,
request)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 737, in ProcessHttpResponse
16:57:22 self.__ProcessHttpResponse(method_config, http_response, request))
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 604, in __ProcessHttpResponse
16:57:22 http_response, method_config=method_config, request=request)
16:57:22 apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing
<https://dataflow.googleapis.com/v1b3/projects/apache-beam-testing/locations/us-central1/jobs/2019-07-14_16_38_18-18216836647829637555?alt=json>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type':
'application/json; charset=UTF-8', 'date': 'Sun, 14 Jul 2019 23:39:06 GMT',
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0',
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff',
'transfer-encoding': 'chunked', 'status': '404', 'content-length': '280',
'-content-encoding': 'gzip'}>, content <{
16:57:22 "error": {
16:57:22 "code": 404,
16:57:22 "message": "(94573f72b4b58430): Information about job
2019-07-14_16_38_18-18216836647829637555 could not be found in our system.
Please double check the id is correct. If it is please contact customer
support.",
16:57:22 "status": "NOT_FOUND"
16:57:22 }
16:57:22 }
{noformat}
The flake due to a 404 error is a known issue that is being addressed on the
service side, however dataflow runner could retry as well.
was (Author: tvalentyn):
Another example: runner.dataflow_client.get_job(job_id) fails with 404, which
causes Dataflow runner to fail:
{noformat}
16:57:22 response = runner.dataflow_client.get_job(job_id)
16:57:22 Found:
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-07-14_16_41_46-17942851702589150224?project=apache-beam-testing.
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
line 197, in wrapper
16:57:22 return fun(*args, **kwargs)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
line 663, in get_job
16:57:22 response = self._client.projects_locations_jobs.Get(request)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py",
line 689, in Get
16:57:22 config, request, global_params=global_params)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 731, in _RunMethod
16:57:22 return self.ProcessHttpResponse(method_config, http_response,
request)
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 737, in ProcessHttpResponse
16:57:22 self.__ProcessHttpResponse(method_config, http_response, request))
16:57:22 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 604, in __ProcessHttpResponse
16:57:22 http_response, method_config=method_config, request=request)
16:57:22 apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing
<https://dataflow.googleapis.com/v1b3/projects/apache-beam-testing/locations/us-central1/jobs/2019-07-14_16_38_18-18216836647829637555?alt=json>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type':
'application/json; charset=UTF-8', 'date': 'Sun, 14 Jul 2019 23:39:06 GMT',
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0',
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff',
'transfer-encoding': 'chunked', 'status': '404', 'content-length': '280',
'-content-encoding': 'gzip'}>, content <{
16:57:22 "error": {
16:57:22 "code": 404,
16:57:22 "message": "(94573f72b4b58430): Information about job
2019-07-14_16_38_18-18216836647829637555 could not be found in our system.
Please double check the id is correct. If it is please contact customer
support.",
16:57:22 "status": "NOT_FOUND"
16:57:22 }
16:57:22 }
{noformat}
The 404 error is a known issue that is being addressed on the service side,
however dataflow runner could retry as well.
> Gracefully handle exceptions when waiting for Dataflow job completion.
> ----------------------------------------------------------------------
>
> Key: BEAM-6202
> URL: https://issues.apache.org/jira/browse/BEAM-6202
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core, test-failures
> Reporter: Robert Bradshaw
> Priority: Major
>
> If there is an error when trying to contact the dataflow service in Python's
> Dataflow.poll_for_job_completion, we may exit the thread prematurely.
> A typical manifestation is: Dataflow Runner fails with:
> {noformat}
> AssertionError: Job did not reach to a terminal state after waiting
> indefinitely.
> {noformat}
> however job execution continues, and succeeds.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)