[
https://issues.apache.org/jira/browse/FLINK-35815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Dziolak updated FLINK-35815:
--------------------------------------
Description:
*Problem:*
We have observed missing retrys on throttling for DescribeStreamSummary calls
from Kinesis.
{code:java}
org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute
application.
...
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.util.FlinkRuntimeException: Could not execute application.
...
Caused by: org.apache.flink.util.FlinkRuntimeException: Could not execute
application.
...
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error: Error registering stream: <edited>
...
Caused by:
org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil$FlinkKinesisStreamConsumerRegistrarException:
Error registering stream: <edited>
...
Caused by:
org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.model.LimitExceededException:
Rate exceeded for stream <edited>. (Service: Kinesis, Status Code: 400,
Request ID: efe4c2a9-3c3b-9c1c-b0ed-d9b05db93be2, Extended Request ID:
pSG6kwQXgPWD2S7YoPT4RKf+g8QbRBaxc0grhNz6juEoti/uGUQTzyqsfmFCLSHoM+u1ydHvqxzsv/0ICUid6aTAQdndy2EO)
...
at
org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.lambda$describeStreamSummary$0(KinesisProxySyncV2.java:91)
at
org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.invokeWithRetryAndBackoff(KinesisProxySyncV2.java:175)
at
org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.describeStreamSummary(KinesisProxySyncV2.java:90)
at
org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.StreamConsumerRegistrar.registerStreamConsumer(StreamConsumerRegistrar.java:92)
at
org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil.registerStreamConsumers(StreamConsumerRegistrarUtil.java:121)
... {code}
*Why does it get stuck?*
*[https://github.com/apache/flink-connector-aws/blob/c716ca439b2c8e6d4b5905a03c867c418e031688/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AwsV2Util.java#L77]*
The `isRecoverableException` check validates the cause of the exception only,
but it doesn't inspect the actual exception being evaluated for retriability.
In this particular case, LimitExceededException is thrown without wrappers and
it appears that this case is not handled correctly.
was:
*Problem:*
We have observed missing retrys on throttling for DescribeStreamSummary calls
from Kinesis.
{code:java}
org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute
application.
...
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.util.FlinkRuntimeException: Could not execute application.
...
Caused by: org.apache.flink.util.FlinkRuntimeException: Could not execute
application.
...
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error: Error registering stream: <edited>
...
Caused by:
org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil$FlinkKinesisStreamConsumerRegistrarException:
Error registering stream: <edited>
...
Caused by:
org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.model.LimitExceededException:
Rate exceeded for stream <edited>. (Service: Kinesis, Status Code: 400,
Request ID: efe4c2a9-3c3b-9c1c-b0ed-d9b05db93be2, Extended Request ID:
pSG6kwQXgPWD2S7YoPT4RKf+g8QbRBaxc0grhNz6juEoti/uGUQTzyqsfmFCLSHoM+u1ydHvqxzsv/0ICUid6aTAQdndy2EO){code}
*Why does it get stuck?*
The `isRecoverableException` check validates the cause of the exception only,
but it doesn't inspect the actual exception being evaluated for retriability.
In this particular case, LimitExceededException is thrown without wrappers and
it appears that this case is not handled correctly.
> KinesisProxySyncV2 doesn't always retry throttling exceptions.
> ---------------------------------------------------------------
>
> Key: FLINK-35815
> URL: https://issues.apache.org/jira/browse/FLINK-35815
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Kinesis
> Affects Versions: aws-connector-4.2.0, aws-connector-4.3.0, 1.19.1
> Reporter: Krzysztof Dziolak
> Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> *Problem:*
> We have observed missing retrys on throttling for DescribeStreamSummary calls
> from Kinesis.
> {code:java}
> org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute
> application.
> ...
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.util.FlinkRuntimeException: Could not execute application.
> ...
> Caused by: org.apache.flink.util.FlinkRuntimeException: Could not execute
> application.
> ...
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The
> main method caused an error: Error registering stream: <edited>
> ...
> Caused by:
> org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil$FlinkKinesisStreamConsumerRegistrarException:
> Error registering stream: <edited>
> ...
> Caused by:
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.model.LimitExceededException:
> Rate exceeded for stream <edited>. (Service: Kinesis, Status Code: 400,
> Request ID: efe4c2a9-3c3b-9c1c-b0ed-d9b05db93be2, Extended Request ID:
> pSG6kwQXgPWD2S7YoPT4RKf+g8QbRBaxc0grhNz6juEoti/uGUQTzyqsfmFCLSHoM+u1ydHvqxzsv/0ICUid6aTAQdndy2EO)
> ...
> at
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.lambda$describeStreamSummary$0(KinesisProxySyncV2.java:91)
> at
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.invokeWithRetryAndBackoff(KinesisProxySyncV2.java:175)
> at
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.describeStreamSummary(KinesisProxySyncV2.java:90)
> at
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.StreamConsumerRegistrar.registerStreamConsumer(StreamConsumerRegistrar.java:92)
> at
> org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil.registerStreamConsumers(StreamConsumerRegistrarUtil.java:121)
> ... {code}
>
>
> *Why does it get stuck?*
> *[https://github.com/apache/flink-connector-aws/blob/c716ca439b2c8e6d4b5905a03c867c418e031688/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AwsV2Util.java#L77]*
> The `isRecoverableException` check validates the cause of the exception only,
> but it doesn't inspect the actual exception being evaluated for retriability.
> In this particular case, LimitExceededException is thrown without wrappers
> and it appears that this case is not handled correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)