[
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509231#comment-17509231
]
Aitozi commented on FLINK-26719:
--------------------------------
> If we do not want to provide stronger resiliency/guarantees than the Flink
> native integration in itself then I guess we do not need to check, or it's
> enough to check at larger intervals.
I have understood generally. In other words, we are using the reconcile loop to
do the periodic check and plan to produce the ERROR events, Right?
I think it's an interesting feature to explore, it may be an ability of
monitoring or self-healing of the operator. The monitoring can use the polling
or the informer based technique.
Thanks for your guys' explanation, Let’s go and see the evolution of this
ability :).
> Rethink the default reschedule reconcile loop
> ---------------------------------------------
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
> Issue Type: Sub-task
> Reporter: Aitozi
> Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I
> think we just need to reconcile
> # waiting for the status change
> # receive the new event
> # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the
> reconcile except waiting for the savepoint result.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)