yirutang commented on a change in pull request #16906:
URL: https://github.com/apache/beam/pull/16906#discussion_r816343787
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java
##########
@@ -404,22 +428,27 @@ public String toString() {
"Got error " + failedContext.getError() + " closing " +
failedContext.streamName);
clearClients.accept(contexts);
appendFailures.inc();
- if (statusCode.equals(Code.OUT_OF_RANGE) ||
statusCode.equals(Code.ALREADY_EXISTS)) {
+ // This means that the offset we have stored does not match the
current end of
+ // the stream in the Storage API. Usually this happens because a
crash or a bundle
+ // failure
+ // happened after an append but before the worker could
checkpoint it's
+ // state. The records that were appended in a failed bundle will
be retried,
+ // meaning that the unflushed tail of the stream must be
discarded to prevent
+ // duplicates.
+ boolean offsetMismatch =
+ statusCode.equals(Code.OUT_OF_RANGE) ||
statusCode.equals(Code.ALREADY_EXISTS);
+ // This implies that the stream doesn't exist or has already
been finalized. In this
+ // case we have no
+ // choice but to create a new stream.
+ boolean streamDoesntExist =
statusCode.equals(Code.INVALID_ARGUMENT);
Review comment:
Yes, I think FAILED_PRECONDITION is possible for SREAM_ALREADY_FINALIZED
and NOT_FOUND can sometimes be returned for STREAM_NOT_FOUND. Hope we could
have better specific exceptions soon. Currently StreamFinalized exception
should already be thrown.
https://github.com/googleapis/java-bigquerystorage/blob/main/google-cloud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/Exceptions.java#L98
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]