Valentyn Tymofieiev created BEAM-7628:
-----------------------------------------

             Summary: Retry createJob requests in Dataflow Runner for retriable 
errors.
                 Key: BEAM-7628
                 URL: https://issues.apache.org/jira/browse/BEAM-7628
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Valentyn Tymofieiev


When Dataflow Runner is sending a job for remote execution, such requests in 
rare cases might fail with retriable errors. Dataflow Runner could recognize a 
class of retriable errors and attempt to resubmit the job again when such 
errors are encountered. Sample retriable error encountered by Beam Java SDK: 

```
java.lang.RuntimeException: Failed to create a workflow job: The operation was 
cancelled.
11:32:14        at 
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:869)
11:32:14        at 
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:178)
11:32:14        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
11:32:14        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
...
11:32:14 Caused by: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 499 Client 
Closed Request
11:32:14 {
11:32:14   "code" : 499,
11:32:14   "errors" : [ {
11:32:14     "domain" : "global",
11:32:14     "message" : "The operation was cancelled.",
11:32:14     "reason" : "backendError"
11:32:14   } ],
11:32:14   "message" : "The operation was cancelled.",
11:32:14   "status" : "CANCELLED"
11:32:14 }
11:32:14        at 
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
11:32:14        at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
11:32:14        at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
11:32:14        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
11:32:14        at 
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
11:32:14        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
11:32:14        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
11:32:14        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
11:32:14        at 
org.apache.beam.runners.dataflow.DataflowClient.createJob(DataflowClient.java:61)
11:32:14        at 
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:855)
11:32:14        ... 41 more'
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to