reuvenlax commented on a change in pull request #16906:
URL: https://github.com/apache/beam/pull/16906#discussion_r816105102
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java
##########
@@ -404,22 +428,27 @@ public String toString() {
"Got error " + failedContext.getError() + " closing " +
failedContext.streamName);
clearClients.accept(contexts);
appendFailures.inc();
- if (statusCode.equals(Code.OUT_OF_RANGE) ||
statusCode.equals(Code.ALREADY_EXISTS)) {
+ // This means that the offset we have stored does not match the
current end of
+ // the stream in the Storage API. Usually this happens because a
crash or a bundle
+ // failure
+ // happened after an append but before the worker could
checkpoint it's
+ // state. The records that were appended in a failed bundle will
be retried,
+ // meaning that the unflushed tail of the stream must be
discarded to prevent
+ // duplicates.
+ boolean offsetMismatch =
+ statusCode.equals(Code.OUT_OF_RANGE) ||
statusCode.equals(Code.ALREADY_EXISTS);
+ // This implies that the stream doesn't exist or has already
been finalized. In this
+ // case we have no
+ // choice but to create a new stream.
+ boolean streamDoesntExist =
statusCode.equals(Code.INVALID_ARGUMENT);
Review comment:
Does it? This seems like a bug if it's the case, as FAILED_PRECONDITION
is generally used for errors that are fixable by the user (e.g. creating a file
in a directory that does not exists), while this is an error that will always
continue regardless of what the user does.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]