scwhittle opened a new issue, #23063:
URL: https://github.com/apache/beam/issues/23063

   ### What happened?
   
   The following error is encountered when using BQ sink, causing pipeline 
performance to suffer.
   
   java.lang.IllegalStateException
   at 
org.apache.beam.sdk.util.Preconditions.checkStateNotNull(Preconditions.java:452)
   at 
org.apache.beam.sdk.io.gcp.bigquery.StorageApiWriteUnshardedRecords$WriteRecordsDoFn$DestinationState.lambda$retrieveErrorDetails$4(StorageApiWriteUnshardedRecords.java:375)
   at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
   
   The code is expecting that when RetryManager calls the onError function with 
the list of operations that every operation had an error.  If a Context does 
not have an error the above exception is thrown because it is null.
   
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java#L356
   
   That doesn't seem to be the case because the RetryManager passes the context 
for all of the operations to the error handler.
   
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/RetryManager.java#L264
   
   This doesn't seem thread-safe because the later operations in the list have 
not yet been awaited on. They may not have been completed, they may have 
completed successfully, or may have completed with an error.
   It seems like either everything should be awaited on before evaluating the 
retry and errors.  Or perhaps only the current awaited operation with an error 
should be passed to the error handler.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to