Hi, Gyula
      Thanks for driving this discussion. I second Yang Wang's idea that
it's better to make the `validator`, `observer` and `reconciler`
self-contained. I also prefer to define the `Observer` as an interface and
we could define the statuses that `Observer` will expose. It acts like the
observer protocol between the `Observer` and `Reconciler`.

Best,
Aitozi.

Yang Wang <danrtsey...@gmail.com> 于2022年2月28日周一 16:28写道:

> Thanks for posting the discussion here.
>
>
> Having the components `validator` `observer` `reconciler` makes lots of
> sense. And the "Validate -> Observe -> Reconcile"
> flow seems natural to me.
>
> Regarding the implementation in the PR, instead of directly using the
> observer in the reconciler, I lean to let the observer
> exports the results to the status(e.g. jobmanager deployment status, rest
> port readiness, flink jobs status, etc.) and
> the reconciler reads it from the status. Then each component is more
> self-contained and the boundary will be clearer.
>
>
> Best,
> Yang
>
> Gyula Fóra <gyf...@apache.org> 于2022年2月28日周一 16:01写道:
>
> > Hi All!
> >
> > I would like to start a discussion thread regarding the structure of
> > the Kubernetes
> > Operator
> > <
> >
> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> > >
> > controller
> > flow. Based on some recent PR discussions we have no clear consensus on
> the
> > structure and the expectations which can potentially lead to back and
> forth
> > changes and unnecessary complexity.
> >
> > *Background*
> > In the initial prototype we had a very basic flow:
> >  1. Observe flink job status
> >  2. (if observation successful) reconcile changes
> >  3. Reschedule reconcile with success/error
> >
> > This basic prototype flow could not cover all requirements and did not
> > allow for things like waiting until Jobmanager deployment is ready etc.
> >
> > To solve these shortcomings, some changes were introduced recently here
> > <https://github.com/apache/flink-kubernetes-operator/pull/21>. While
> this
> > change introduced many improvements and safeguards it also completely
> > changed the original controller flow. Now the reconciler is responsible
> for
> > ensuring that it can actually reconcile by checking the deployment and
> > ports. The job status observation logic has also been moved into the
> actual
> > reconcile logic.
> >
> >
> > *Discussion Question*What controller flow would we like to have? Do we
> want
> > to separate the observer from the reconciler or keep them together?
> >
> > In my personal view, we should try to adopt a very simple flow to make
> the
> > operator clean and modular. If possible I would like to restore the
> > original flow with some modifications:
> >
> >  1. Validate deployment object
> >  2. Observe deployment and flink job status -> Return comprehensive
> status
> > info
> >  3. Reconcile deployment based on observed status and resource changes
> >  (Both 2/3 should be able to reschedule immediately if necessary)
> >
> > I think the Observer component should be able to describe the current
> > status of the deployment objects and the flink job to the extent that the
> > reconciler can work with that information alone. If we do it this way, we
> > can also use the status information that the observer provides to produce
> > other events and aid operations like shutdown which depend on the current
> > deployment status.
> >
> > I think this would satisfy our needs, but I might be missing something
> that
> > cannot be done if we structure the code this way.
> >
> > I have a PR open
> > <https://github.com/apache/flink-kubernetes-operator/pull/26/commits>
> > which
> > includes some of these proposed changes (as the optional second commit)
> so
> > that you can easily compare with the current state of the operator.
> >
> > Please let us know what we think!
> >
> > Cheers,
> > Gyula
> >
>

Reply via email to