[
https://issues.apache.org/jira/browse/FLINK-18894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176351#comment-17176351
]
Igal Shilman commented on FLINK-18894:
--------------------------------------
So, it seems like the problem is with the way the feedback loop and Mailbox are
interacting.
* The sync checkpoint comes through the feedback union operator.
* The feedback union operator pass it to the functions operator, but starts
yielding at
org.apache.flink.streaming.runtime.tasks.StreamTask.runSynchronousSavepointMailboxLoop()
* Then, the feedback sink operator receives the barrier and translates it to a
checkpoint sentinel message, that goes into feedback-union-operator's mailbox.
Since the feedback union operator is yielding, It would not be able to pick up
that task of completing the checkpoint.
> StateFun job stalls on stop-with-savepoint
> ------------------------------------------
>
> Key: FLINK-18894
> URL: https://issues.apache.org/jira/browse/FLINK-18894
> Project: Flink
> Issue Type: Bug
> Components: Stateful Functions
> Affects Versions: statefun-2.1.0, statefun-2.2.0
> Reporter: Seth Wiesman
> Assignee: Igal Shilman
> Priority: Blocker
> Attachments: stacktrace.txt
>
>
> Stateful Function jobs stall when performing a stop with savepoint. The
> FunctionDispatchOperator never completes the sync portion of the savepoint.
> Taking a savepoint and then canceling in two separate steps works correctly,
> it is only the stop command that has issues.
> {code}
> curl -X POST localhost:8001/jobs/:jobid/stop -d '{"drain": false}'
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)