scwhittle opened a new issue, #25530: URL: https://github.com/apache/beam/issues/25530
### What happened? Pipelines writing many files to GCS can experience RESOURCE_EXHAUSTED errors from GCS metadata operations. In particular during finalizing temporary files, they are renamed from temp directory to the final location. There should be sufficient backoff so that the pipeline reduces it's traffic to an eventually supported rate for GCS. Currently it can just remain in a state where there is a steady rate of RESOURCE_EXHAUSTED errors preventing pipeline progress as a bundle can only complete successfully if all renames it contains are successful. See https://cloud.google.com/storage/docs/request-rate for guidance on GCS operation rates. When 429 Resource Exhausted errors are encountered the operation should be retried with backoff. Retries of individual requests are performed internally to the rewrite request using the RetryHttpRequestInitializer https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L129 Retry of overall batches is also performed here: https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L1005 Those may need to have more retry attempts or backoff in the face of resource exhausted errors. When they fail, the entire bundle itself is reprocessed which may just trigger more load. Additionally when the RetryHttpRequestInitializer attempts are exhausted there appears to be a parsing error that then triggers retrying the entire bundle. I think we could override the request parser to handle this, or we could catch it in GCS util [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L1040) and just retry the specific batch. ``` Caused by: java.io.IOException: Error executing batch GCS request at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.executeBatches(GcsUtil.java:780) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.rewriteHelper(GcsUtil.java:1040) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.rename(GcsUtil.java:991) at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.rename(GcsFileSystem.java:171) at org.apache.beam.sdk.io.FileSystems.renameInternal(FileSystems.java:323) at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:308) at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:802) at org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:1077) Caused by: java.util.concurrent.ExecutionException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'This': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: (BufferedInputStream); line: 1, column: 6] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:61) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.executeBatches(GcsUtil.java:772) ... Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'This': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: (BufferedInputStream); line: 1, column: 6] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:745) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3635) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2734) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:902) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:794) at com.google.api.client.json.jackson2.JacksonParser.nextToken(JacksonParser.java:53) at com.google.api.client.json.JsonParser.startParsing(JsonParser.java:213) at com.google.api.client.json.JsonParser.parse(JsonParser.java:358) at com.google.api.client.json.JsonParser.parse(JsonParser.java:335) at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:79) at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:73) at com.google.api.client.googleapis.batch.BatchUnparsedResponse.getParsedDataClass(BatchUnparsedResponse.java:223) at com.google.api.client.googleapis.batch.BatchUnparsedResponse.parseAndCallback(BatchUnparsedResponse.java:208) at com.google.api.client.googleapis.batch.BatchUnparsedResponse.parseNextResponse(BatchUnparsedResponse.java:149) at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:269) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil$1.execute(GcsUtil.java:243) at org.apache.beam.sdk.util.MoreFutures.lambda$runAsync$2(MoreFutures.java:141) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) ``` ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [ ] Component: Python SDK - [X] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
