[
https://issues.apache.org/jira/browse/BEAM-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944295#comment-15944295
]
Davor Bonaci commented on BEAM-1735:
------------------------------------
This is a batched copy request to Cloud Storage during FileBasedSink
finalization.
Stack trace from an SDK that was built pre-Beam package rename:
java.io.IOException: Error executing batch GCS request
at
com.google.cloud.dataflow.sdk.util.GcsUtil.executeBatches(GcsUtil.java:414)
at com.google.cloud.dataflow.sdk.util.GcsUtil.copy(GcsUtil.java:421)
at
com.google.cloud.dataflow.sdk.io.FileBasedSink$GcsOperations.copy(FileBasedSink.java:663)
at
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.copyToOutputFiles(FileBasedSink.java:378)
at
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.finalize(FileBasedSink.java:342)
at
com.google.cloud.dataflow.sdk.io.Write$Bound$2.processElement(Write.java:416)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error
trying to copy [cut]/part-14135-of-22733.protobuf.avro:
{"code":403,"errors":[{"domain":"usageLimits","message":"User Rate Limit
Exceeded","reason":"userRateLimitExceeded"}],"message":"User Rate Limit
Exceeded"}
at
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
at
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:455)
at
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
at
com.google.cloud.dataflow.sdk.util.GcsUtil.executeBatches(GcsUtil.java:409)
at com.google.cloud.dataflow.sdk.util.GcsUtil.copy(GcsUtil.java:421)
at
com.google.cloud.dataflow.sdk.io.FileBasedSink$GcsOperations.copy(FileBasedSink.java:663)
at
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.copyToOutputFiles(FileBasedSink.java:378)
at
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.finalize(FileBasedSink.java:342)
at
com.google.cloud.dataflow.sdk.io.Write$Bound$2.processElement(Write.java:416)
at
com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
at
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
at
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
at
com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:221)
at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)
at
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:284)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:220)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:170)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172)
at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error trying to copy
[cut]/part-14135-of-22733.protobuf.avro:
{"code":403,"errors":[{"domain":"usageLimits","message":"User Rate Limit
Exceeded","reason":"userRateLimitExceeded"}],"message":"User Rate Limit
Exceeded"}
at
com.google.cloud.dataflow.sdk.util.GcsUtil$3.onFailure(GcsUtil.java:484)
at
com.google.api.client.googleapis.batch.json.JsonBatchCallback.onFailure(JsonBatchCallback.java:54)
at
com.google.api.client.googleapis.batch.json.JsonBatchCallback.onFailure(JsonBatchCallback.java:50)
at
com.google.api.client.googleapis.batch.BatchUnparsedResponse.parseAndCallback(BatchUnparsedResponse.java:223)
at
com.google.api.client.googleapis.batch.BatchUnparsedResponse.parseNextResponse(BatchUnparsedResponse.java:155)
at
com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:253)
at com.google.cloud.dataflow.sdk.util.GcsUtil$2.call(GcsUtil.java:402)
at com.google.cloud.dataflow.sdk.util.GcsUtil$2.call(GcsUtil.java:400)
at
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
... 3 more
Current retry logic:
* No retry at individual copy, as part of the batched request.
* Retry exists at batched request level.
* Retry logic exists at bundle level.
We could theoretically add logic for extracting individual failed copies from
the batched requests, keep them, and try to retry only those that failed (in
the same bundle). This would be a complex, error-prone logic. Hard to think of
another failure type where this would be beneficial -- except quota on a small
time window. I can be convinced in the value of low-level fix if it can be
shown that the probability of GCS failing a portion of the batched copy is high
enough.
> Retry 403:rateLimitExceeded in GCS
> ----------------------------------
>
> Key: BEAM-1735
> URL: https://issues.apache.org/jira/browse/BEAM-1735
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-gcp
> Reporter: Rafael Fernandez
> Priority: Minor
>
> The GCS documentation [1] states that rateLimitExceeded, a 403 error, should
> be retried exponentially. We currently do not retry it.
> [1] https://cloud.google.com/storage/docs/json_api/v1/status-codes
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)