Valentyn Tymofieiev created BEAM-7628:
-----------------------------------------
Summary: Retry createJob requests in Dataflow Runner for retriable
errors.
Key: BEAM-7628
URL: https://issues.apache.org/jira/browse/BEAM-7628
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Valentyn Tymofieiev
When Dataflow Runner is sending a job for remote execution, such requests in
rare cases might fail with retriable errors. Dataflow Runner could recognize a
class of retriable errors and attempt to resubmit the job again when such
errors are encountered. Sample retriable error encountered by Beam Java SDK:
```
java.lang.RuntimeException: Failed to create a workflow job: The operation was
cancelled.
11:32:14 at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:869)
11:32:14 at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:178)
11:32:14 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
11:32:14 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
...
11:32:14 Caused by:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 499 Client
Closed Request
11:32:14 {
11:32:14 "code" : 499,
11:32:14 "errors" : [ {
11:32:14 "domain" : "global",
11:32:14 "message" : "The operation was cancelled.",
11:32:14 "reason" : "backendError"
11:32:14 } ],
11:32:14 "message" : "The operation was cancelled.",
11:32:14 "status" : "CANCELLED"
11:32:14 }
11:32:14 at
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
11:32:14 at
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
11:32:14 at
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
11:32:14 at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
11:32:14 at
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
11:32:14 at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
11:32:14 at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
11:32:14 at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
11:32:14 at
org.apache.beam.runners.dataflow.DataflowClient.createJob(DataflowClient.java:61)
11:32:14 at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:855)
11:32:14 ... 41 more'
```
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)