scwhittle opened a new issue, #23063: URL: https://github.com/apache/beam/issues/23063
### What happened? The following error is encountered when using BQ sink, causing pipeline performance to suffer. java.lang.IllegalStateException at org.apache.beam.sdk.util.Preconditions.checkStateNotNull(Preconditions.java:452) at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWriteUnshardedRecords$WriteRecordsDoFn$DestinationState.lambda$retrieveErrorDetails$4(StorageApiWriteUnshardedRecords.java:375) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) The code is expecting that when RetryManager calls the onError function with the list of operations that every operation had an error. If a Context does not have an error the above exception is thrown because it is null. https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java#L356 That doesn't seem to be the case because the RetryManager passes the context for all of the operations to the error handler. https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManager.java#L264 This doesn't seem thread-safe because the later operations in the list have not yet been awaited on. They may not have been completed, they may have completed successfully, or may have completed with an error. It seems like either everything should be awaited on before evaluating the retry and errors. Or perhaps only the current awaited operation with an error should be passed to the error handler. ### Issue Priority Priority: 2 ### Issue Component Component: io-java-gcp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
