[ 
https://issues.apache.org/jira/browse/FLINK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766401#comment-17766401
 ] 

Rui Fan edited comment on FLINK-33109 at 9/18/23 2:19 PM:
----------------------------------------------------------

Hi [~YordanPavlov] , thanks for your report. I'm not sure whether this bug has 
been fixed. Could you try the 1.17.2 or 1.18.0? Note: these 2 versions aren't 
released so far. You need to try it locally or build a version by yourself.

I want you try the 1.17.2 or 1.18.0, because we have fixed a series of bugs 
over the past few months, you can get detailed information from FLINK-32548.

If your job doesn't work well with the latest code, it may be a new bug. Would 
you like to fix it? If yes, I can assign this ticket to you and I can help you 
review. If no, I'm be happy to analysis this bug and fix it.

 

BTW, FLINK-32548 (Flink-1.18) marks the watermark alignment ready for 
production use, so I hope it can be fixed asap if it's indeed a bug. It's 
better to fix this bug before 1.18.0 is released, thanks :)


was (Author: fanrui):
Hi [~YordanPavlov] , thanks for your report. I'm not sure whether this bug has 
been fixed. Could you try the 1.17.2 or 1.18.0? Note: these 2 versions aren't 
released so far. You need to try it locally or build a version by yourself.

I want you try the 1.17.2 or 1.18.0, because we have fixed a series of bugs 
voer the past few months, you can get detailed information from FLINK-32548.

If your job doesn't work well with the latest code, it may be a new bug. Would 
you like to fix it? If yes, I can assign this ticket to you and I can help you 
review. If no, I'm be happy to analysis this bug and fix it.

> Watermark alignment not applied after recovery from checkpoint
> --------------------------------------------------------------
>
>                 Key: FLINK-33109
>                 URL: https://issues.apache.org/jira/browse/FLINK-33109
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.17.1
>            Reporter: Yordan Pavlov
>            Priority: Major
>         Attachments: image-2023-09-18-15-40-06-868.png, 
> image-2023-09-18-15-46-16-106.png
>
>
> I am observing a problem where after recovery from a checkpoint the Kafka 
> source watermarks would start to diverge not honoring the watermark alignment 
> setting I have applied.
> I have a Kafka source which reads a topic with 32 partitions. I am applying 
> the following watermark strategy:
> {code:java}
> new EventAwareWatermarkStrategy[KeyedKafkaSourceMessage]](msg => 
> msg.value.getTimestamp)
>       .withWatermarkAlignment("alignment-sources-group", 
> time.Duration.ofMillis(sourceWatermarkAlignmentBlocks)){code}
>  
> This works great up until my job needs to recover from checkpoint. Once the 
> recovery takes place, no alignment is taking place any more. This can best be 
> illustrated by looking at the watermark metrics for various operators in the 
> image:
> !image-2023-09-18-15-40-06-868.png!
>  
> You can see how the watermarks disperse after the recovery. Trying to debug 
> the problem I noticed that before the failure there would be calls in
>  
> {code:java}
> SourceCoordinator::announceCombinedWatermark() 
> {code}
> after the recovery, no calls get there, so no value for 
> {code:java}
> watermarkAlignmentParams.getMaxAllowedWatermarkDrift(){code}
> is ever read. I can manually fix the problem If I stop the job, clear all 
> state from Zookeeper and then manually start Flink providing the last 
> checkpoint with 
> {code:java}
> '–fromSavepoint'{code}
>  flag. This would cause the SourceCoordinator to be constructed properly and 
> watermark drift to be checked. Once recovery manually watermarks would again 
> converge to the allowed drift as seen in the metrics:
> !image-2023-09-18-15-46-16-106.png!
>  
> Let me know If I can be helpful by providing any more information.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to