[jira] [Updated] (FLINK-39003) Inaccurate millisBehindLatest metric from the Kinesis source connector on failover

Emre Kartoglu (Jira) Fri, 30 Jan 2026 08:43:06 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-39003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Emre Kartoglu updated FLINK-39003:
----------------------------------
    Description: 
The `millisBehindLatest` metric emitted by the Kinesis source connector

 

```

 <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kinesis</artifactId>
            <version>5.0.0-1.20</version>
</dependency>

```

appears to be inaccurate on app failover.

Attached are 2 screenshots from AWS Cloudwatch around the same timestamp when 
the app (running on Amazon MSF) had continuous restarts. One of the screenshots 
shows the `millisBehindLatest` emitted by the DynamoDB (DDB) connector, and the 
other one shows that from the Kinesis connector. The metrics from the DDB 
connector look fairly accurate, i.e. within 1-2 minutes of outage, we had 81 
seconds of latency. However the metric from the Kinesis connector shows ~3854 
seconds of latency within 1-2 minutes of app continuous restarts. When the same 
metric was 0 or close to 0 right before the restarts. 

The issue may well be caused by the AWS managed service, but the accurate DDB 
metrics in the same system suggests a possibility that this might be a Kinesis 
connector bug. 

 

!Screenshot 2026-01-30 at 12.40.35.png!

  was:
The `millisBehindLatest` metric emitted by the Kinesis source connector

 

```

 <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kinesis</artifactId>
            <version>5.0.0-1.2</version>
</dependency>

```

appears to be inaccurate on app failover.

Attached are 2 screenshots from AWS Cloudwatch around the same timestamp when 
the app (running on Amazon MSF) had continuous restarts. One of the screenshots 
shows the `millisBehindLatest` emitted by the DynamoDB (DDB) connector, and the 
other one shows that from the Kinesis connector. The metrics from the DDB 
connector look fairly accurate, i.e. within 1-2 minutes of outage, we had 81 
seconds of latency. However the metric from the Kinesis connector shows ~3854 
seconds of latency within 1-2 minutes of app continuous restarts. When the same 
metric was 0 or close to 0 right before the restarts. 

The issue may well be caused by the AWS managed service, but the accurate DDB 
metrics in the same system suggests a possibility that this might be a Kinesis 
connector bug. 

 

!Screenshot 2026-01-30 at 12.40.35.png!


> Inaccurate millisBehindLatest metric from the Kinesis source connector on 
> failover
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-39003
>                 URL: https://issues.apache.org/jira/browse/FLINK-39003
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kinesis
>    Affects Versions: aws-connector-5.0.0
>            Reporter: Emre Kartoglu
>            Assignee: Emre Kartoglu
>            Priority: Major
>              Labels: AWS
>         Attachments: Screenshot 2026-01-30 at 12.40.35.png, Screenshot 
> 2026-01-30 at 12.47.00.png
>
>
> The `millisBehindLatest` metric emitted by the Kinesis source connector
>  
> ```
>  <dependency>
>             <groupId>org.apache.flink</groupId>
>             <artifactId>flink-connector-kinesis</artifactId>
>             <version>5.0.0-1.20</version>
> </dependency>
> ```
> appears to be inaccurate on app failover.
> Attached are 2 screenshots from AWS Cloudwatch around the same timestamp when 
> the app (running on Amazon MSF) had continuous restarts. One of the 
> screenshots shows the `millisBehindLatest` emitted by the DynamoDB (DDB) 
> connector, and the other one shows that from the Kinesis connector. The 
> metrics from the DDB connector look fairly accurate, i.e. within 1-2 minutes 
> of outage, we had 81 seconds of latency. However the metric from the Kinesis 
> connector shows ~3854 seconds of latency within 1-2 minutes of app continuous 
> restarts. When the same metric was 0 or close to 0 right before the restarts. 
> The issue may well be caused by the AWS managed service, but the accurate DDB 
> metrics in the same system suggests a possibility that this might be a 
> Kinesis connector bug. 
>  
> !Screenshot 2026-01-30 at 12.40.35.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-39003) Inaccurate millisBehindLatest metric from the Kinesis source connector on failover

Reply via email to