[
https://issues.apache.org/jira/browse/FLINK-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567823#comment-16567823
]
ASF GitHub Bot commented on FLINK-10020:
----------------------------------------
tzulitai commented on a change in pull request #6482: [FLINK-10020] [kinesis]
Support recoverable exceptions in listShards.
URL: https://github.com/apache/flink/pull/6482#discussion_r207447268
##########
File path:
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
##########
@@ -433,6 +440,16 @@ private ListShardsResult listShards(String streamName,
@Nullable String startSha
} catch (ExpiredNextTokenException expiredToken) {
LOG.warn("List Shards has an expired token.
Reusing the previous state.");
break;
+ } catch (SdkClientException ex) {
+ if (isRecoverableSdkClientException(ex)) {
+ long backoffMillis = fullJitterBackoff(
+ listShardsBaseBackoffMillis,
listShardsMaxBackoffMillis, listShardsExpConstant, attemptCount++);
+ LOG.warn("Got SdkClientException when
listing shards from stream {}. Backing off for {} millis.",
+ streamName, backoffMillis);
+ Thread.sleep(backoffMillis);
Review comment:
I'm wondering what kind of `SdkClientException`s there are. Do we really
need to have a backoff here before retrying?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Kinesis Consumer listShards should support more recoverable exceptions
> ----------------------------------------------------------------------
>
> Key: FLINK-10020
> URL: https://issues.apache.org/jira/browse/FLINK-10020
> Project: Flink
> Issue Type: Improvement
> Components: Kinesis Connector
> Reporter: Thomas Weise
> Assignee: Thomas Weise
> Priority: Major
> Labels: pull-request-available
>
> Currently transient errors in listShards make the consumer fail and cause the
> entire job to reset. That is unnecessary for certain exceptions (like status
> 503 errors). It should be possible to control the exceptions that qualify for
> retry, similar to getRecords/isRecoverableSdkClientException.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)