[
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508697#comment-17508697
]
Gyula Fora commented on FLINK-26719:
------------------------------------
I agree that in an ideal case once we reach a READY deployment state + job is
running we could technically stop periodic reonciliation.
There are a few caveats here which tie into what [~wangyang0918] is suggesting.
How much do we trust that once a Flink Deployment is running it will be able to
self heal, recover?
In cases when it goes into a crash loop, broken state, is there anything the
operator can do anyways?
If we expect to be able to react to broken deployments , then to guarantee
SLA-s we actually need frequent rechecks. If we do not want to provide stronger
resiliency/guarantees than the Flink native integration in itself then I guess
we do not need to check, or it's enough to check at larger intervals.
With the current logic the best we would do is trigger an ERROR event but we
wouldn't try to "repair" broken deployments. That is still valuable if the user
is listening to these events though. Not sure what alternatives we have other
than the reconcile loop. Maybe as [~matyas] said, listening to events or
informers could be an alternative but it's still far from an actual funtional
observe loop.
> Rethink the default reschedule reconcile loop
> ---------------------------------------------
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
> Issue Type: Sub-task
> Reporter: Aitozi
> Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I
> think we just need to reconcile
> # waiting for the status change
> # receive the new event
> # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the
> reconcile except waiting for the savepoint result.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)