[ 
https://issues.apache.org/jira/browse/BEAM-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884855#comment-16884855
 ] 

Valentyn Tymofieiev edited comment on BEAM-6202 at 7/15/19 5:07 AM:
--------------------------------------------------------------------

Another example:  runner.dataflow_client.get_job(job_id) fails with 404, which 
causes Dataflow runner to fail:

{noformat}
16:57:22     response = runner.dataflow_client.get_job(job_id)
16:57:22 Found: 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-07-14_16_41_46-17942851702589150224?project=apache-beam-testing.
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
 line 197, in wrapper
16:57:22     return fun(*args, **kwargs)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
 line 663, in get_job
16:57:22     response = self._client.projects_locations_jobs.Get(request)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py",
 line 689, in Get
16:57:22     config, request, global_params=global_params)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 731, in _RunMethod
16:57:22     return self.ProcessHttpResponse(method_config, http_response, 
request)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 737, in ProcessHttpResponse
16:57:22     self.__ProcessHttpResponse(method_config, http_response, request))
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 604, in __ProcessHttpResponse
16:57:22     http_response, method_config=method_config, request=request)
16:57:22 apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing 
<https://dataflow.googleapis.com/v1b3/projects/apache-beam-testing/locations/us-central1/jobs/2019-07-14_16_38_18-18216836647829637555?alt=json>:
 response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 
'application/json; charset=UTF-8', 'date': 'Sun, 14 Jul 2019 23:39:06 GMT', 
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 
'transfer-encoding': 'chunked', 'status': '404', 'content-length': '280', 
'-content-encoding': 'gzip'}>, content <{
16:57:22   "error": {
16:57:22     "code": 404,
16:57:22     "message": "(94573f72b4b58430): Information about job 
2019-07-14_16_38_18-18216836647829637555 could not be found in our system. 
Please double check the id is correct. If it is please contact customer 
support.",
16:57:22     "status": "NOT_FOUND"
16:57:22   }
16:57:22 }
{noformat}

The flake due to a 404 error is a known issue that is being addressed on the 
service side, however dataflow runner could retry as well.


was (Author: tvalentyn):
Another example:  runner.dataflow_client.get_job(job_id) fails with 404, which 
causes Dataflow runner to fail:

{noformat}
16:57:22     response = runner.dataflow_client.get_job(job_id)
16:57:22 Found: 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-07-14_16_41_46-17942851702589150224?project=apache-beam-testing.
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
 line 197, in wrapper
16:57:22     return fun(*args, **kwargs)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
 line 663, in get_job
16:57:22     response = self._client.projects_locations_jobs.Get(request)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py",
 line 689, in Get
16:57:22     config, request, global_params=global_params)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 731, in _RunMethod
16:57:22     return self.ProcessHttpResponse(method_config, http_response, 
request)
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 737, in ProcessHttpResponse
16:57:22     self.__ProcessHttpResponse(method_config, http_response, request))
16:57:22   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 604, in __ProcessHttpResponse
16:57:22     http_response, method_config=method_config, request=request)
16:57:22 apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing 
<https://dataflow.googleapis.com/v1b3/projects/apache-beam-testing/locations/us-central1/jobs/2019-07-14_16_38_18-18216836647829637555?alt=json>:
 response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 
'application/json; charset=UTF-8', 'date': 'Sun, 14 Jul 2019 23:39:06 GMT', 
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 
'transfer-encoding': 'chunked', 'status': '404', 'content-length': '280', 
'-content-encoding': 'gzip'}>, content <{
16:57:22   "error": {
16:57:22     "code": 404,
16:57:22     "message": "(94573f72b4b58430): Information about job 
2019-07-14_16_38_18-18216836647829637555 could not be found in our system. 
Please double check the id is correct. If it is please contact customer 
support.",
16:57:22     "status": "NOT_FOUND"
16:57:22   }
16:57:22 }
{noformat}

The 404 error is a known issue that is being addressed on the service side, 
however dataflow runner could retry as well.

> Gracefully handle exceptions when waiting for Dataflow job completion.
> ----------------------------------------------------------------------
>
>                 Key: BEAM-6202
>                 URL: https://issues.apache.org/jira/browse/BEAM-6202
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core, test-failures
>            Reporter: Robert Bradshaw
>            Priority: Major
>
> If there is an error when trying to contact the dataflow service in Python's 
> Dataflow.poll_for_job_completion, we may exit the thread prematurely. 
> A typical manifestation is: Dataflow Runner fails with:
> {noformat}
> AssertionError: Job did not reach to a terminal state after waiting 
> indefinitely.
> {noformat}
> however job execution continues, and succeeds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to