[jira] [Commented] (FLINK-21839) SinkFunction snapshotState don't snapshot all data when trigger a stop-with-drain savepoint

Yuan Mei (Jira) Tue, 13 Apr 2021 00:29:40 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319985#comment-17319985
 ]


Yuan Mei commented on FLINK-21839:
----------------------------------

Hey [~lintingbin], with the info provided in the ticket, I can not tell it is a 
but or not. 

I think that's the expected behavior (not saying it is the right behavior, but 
expected). Before diving into the detail, would you mind sharing a bit more 
info with your pipeline? Is it using the old source or a new source (FLIP 27), 
what does your application look like? Is there any suspicious log?


The reason why `TaskSink` invokes after snapshot:

`stop-with-drain` procedure contains two stages:
1. insert max_watermark and create the savepoint
2. stop the source to trigger `END_OF_PARTITION` event to stop the job.

In the gap between "draining window state and create savepoint" and "handling 
END_OF_PARTITION", there is still data flow in from the source. 

??      SourceStreamTask#finishTask
        /**
         * Currently stop with savepoint relies on the EndOfPartitionEvents 
propagation and performs
         * clean shutdown after the stop with savepoint (which can produce some 
records to process
         * after the savepoint while stopping). If we interrupt source thread, 
we might leave the
         * network stack in an inconsistent state. So, if we want to relay on 
the clean shutdown, we
         * can not interrupt the source thread.
         */??

You can refer to "FLINK-21133"and FLIP-147 for a bit more detail.

> SinkFunction snapshotState don't snapshot all data when trigger a 
> stop-with-drain savepoint
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-21839
>                 URL: https://issues.apache.org/jira/browse/FLINK-21839
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.12.2
>            Reporter: Darcy Lin
>            Assignee: Yuan Mei
>            Priority: Critical
>         Attachments: TestSink.java
>
>
> This problem was discovered when I was developing the flink code. In my flink 
> code, my custom sink don't send all data that be produced by event_time 
> window when trigger stop-with-drain savepoint .
> TestSink.java is a example that SinkFunction invoke() continues to run after 
> snapshotState() executed when trigger a stop-with-drain savepoint by rest api.
> {code:java}
> //TaskSink.java log
> sink open
> invoke: 0
> invoke: 1
> invoke: 2
> invoke: 3
> invoke: 4
> invoke: 5
> invoke: 6
> invoke: 7
> invoke: 8
> invoke: 9
> ...
> invoke: 425
> invoke: 426
> invoke: 427
> snapshotState
> invoke: 428 // It should be executed before snapshotState.
> sink close{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21839) SinkFunction snapshotState don't snapshot all data when trigger a stop-with-drain savepoint

Reply via email to