reuvenlax commented on a change in pull request #16906:
URL: https://github.com/apache/beam/pull/16906#discussion_r816105102



##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java
##########
@@ -404,22 +428,27 @@ public String toString() {
                   "Got error " + failedContext.getError() + " closing " + 
failedContext.streamName);
               clearClients.accept(contexts);
               appendFailures.inc();
-              if (statusCode.equals(Code.OUT_OF_RANGE) || 
statusCode.equals(Code.ALREADY_EXISTS)) {
+              // This means that the offset we have stored does not match the 
current end of
+              // the stream in the Storage API. Usually this happens because a 
crash or a bundle
+              // failure
+              // happened after an append but before the worker could 
checkpoint it's
+              // state. The records that were appended in a failed bundle will 
be retried,
+              // meaning that the unflushed tail of the stream must be 
discarded to prevent
+              // duplicates.
+              boolean offsetMismatch =
+                  statusCode.equals(Code.OUT_OF_RANGE) || 
statusCode.equals(Code.ALREADY_EXISTS);
+              // This implies that the stream doesn't exist or has already 
been finalized. In this
+              // case we have no
+              // choice but to create a new stream.
+              boolean streamDoesntExist = 
statusCode.equals(Code.INVALID_ARGUMENT);

Review comment:
       Does it? This seems like a bug if it's the case, as FAILED_PRECONDITION 
is generally used for errors that are fixable by the user (e.g. creating a file 
in a directory that does not exists), while this is an error that will always 
continue regardless of what the user does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to