Hi, Gyula Thanks for driving this discussion. I second Yang Wang's idea that it's better to make the `validator`, `observer` and `reconciler` self-contained. I also prefer to define the `Observer` as an interface and we could define the statuses that `Observer` will expose. It acts like the observer protocol between the `Observer` and `Reconciler`.
Best, Aitozi. Yang Wang <danrtsey...@gmail.com> 于2022年2月28日周一 16:28写道: > Thanks for posting the discussion here. > > > Having the components `validator` `observer` `reconciler` makes lots of > sense. And the "Validate -> Observe -> Reconcile" > flow seems natural to me. > > Regarding the implementation in the PR, instead of directly using the > observer in the reconciler, I lean to let the observer > exports the results to the status(e.g. jobmanager deployment status, rest > port readiness, flink jobs status, etc.) and > the reconciler reads it from the status. Then each component is more > self-contained and the boundary will be clearer. > > > Best, > Yang > > Gyula Fóra <gyf...@apache.org> 于2022年2月28日周一 16:01写道: > > > Hi All! > > > > I would like to start a discussion thread regarding the structure of > > the Kubernetes > > Operator > > < > > > https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java > > > > > controller > > flow. Based on some recent PR discussions we have no clear consensus on > the > > structure and the expectations which can potentially lead to back and > forth > > changes and unnecessary complexity. > > > > *Background* > > In the initial prototype we had a very basic flow: > > 1. Observe flink job status > > 2. (if observation successful) reconcile changes > > 3. Reschedule reconcile with success/error > > > > This basic prototype flow could not cover all requirements and did not > > allow for things like waiting until Jobmanager deployment is ready etc. > > > > To solve these shortcomings, some changes were introduced recently here > > <https://github.com/apache/flink-kubernetes-operator/pull/21>. While > this > > change introduced many improvements and safeguards it also completely > > changed the original controller flow. Now the reconciler is responsible > for > > ensuring that it can actually reconcile by checking the deployment and > > ports. The job status observation logic has also been moved into the > actual > > reconcile logic. > > > > > > *Discussion Question*What controller flow would we like to have? Do we > want > > to separate the observer from the reconciler or keep them together? > > > > In my personal view, we should try to adopt a very simple flow to make > the > > operator clean and modular. If possible I would like to restore the > > original flow with some modifications: > > > > 1. Validate deployment object > > 2. Observe deployment and flink job status -> Return comprehensive > status > > info > > 3. Reconcile deployment based on observed status and resource changes > > (Both 2/3 should be able to reschedule immediately if necessary) > > > > I think the Observer component should be able to describe the current > > status of the deployment objects and the flink job to the extent that the > > reconciler can work with that information alone. If we do it this way, we > > can also use the status information that the observer provides to produce > > other events and aid operations like shutdown which depend on the current > > deployment status. > > > > I think this would satisfy our needs, but I might be missing something > that > > cannot be done if we structure the code this way. > > > > I have a PR open > > <https://github.com/apache/flink-kubernetes-operator/pull/26/commits> > > which > > includes some of these proposed changes (as the optional second commit) > so > > that you can easily compare with the current state of the operator. > > > > Please let us know what we think! > > > > Cheers, > > Gyula > > >