zhengcanbin opened a new pull request #11233: [FLINK-16194][k8s] Refactor the Kubernetes decorator design URL: https://github.com/apache/flink/pull/11233 ## What is the purpose of the change So far, Flink has made efforts for the native integration of Kubernetes. However, it is always essential to evaluate the existing design and consider alternatives that have better design and are easier to maintain in the long run. We have suffered from some problems while developing new features base on the current code. Here is some of them: 1. We don’t have a unified monadic-step based orchestrator architecture to construct all the Kubernetes resources. 2. We don’t have dedicated objects or tools for centrally parsing, verifying, and managing the Kubernetes parameters, which has raised some maintenance and inconsistency issues. The ultimate goal of this PR is to evolve some of the designs. Here is a summary of the main evolution. 1. Introduce a unified monadic-step based orchestrator architecture that has a better, cleaner and consistent abstraction for the Kubernetes resources construction process. 2. Introduce some dedicated tools for centrally parsing, verifying, and managing the Kubernetes parameters. ## Open Questions 1. For the new design, we change the owner from the internal Service to the Deployment for GC. There are several concerns: - We do not need the internal Service to forward request from TaskManger to JobManager in the HA mode, that Service would be removed in such a scenario in another issue. - Things like Deployment are the first citizen in Kubernetes, it is reasonable that one deletes the controller that runs the master leads to clean-up of all the other resources together representing that Application. 2. For the new design, we don't listen to **ADD** event when creating the rest Service. The previous design assumes that the Service is ready once the client receives the **ADD** event. However, this is incorrect, no matter for the LB or the NodePort type. We plan to open another issue to fix this problem. ## Brief change log Main changes are: - [e629bbc](https://github.com/apache/flink/commit/e629bbc4091e9288f74e2d6a9cfd689daabeb4a3) Trivial code clean-up and test code normalization. - [675151e](https://github.com/apache/flink/commit/675151e02d7b91e0736963ed2b24f2b8c3ff7046): Remove the existing decorator design patterns. - [0355f0a](https://github.com/apache/flink/commit/0355f0a6d95bca530b4018fecf11db7560626956): Refactor and simplify KubernetesTestBase. - [edc3d23](https://github.com/apache/flink/commit/edc3d23742d64dcbd07b24b72b674c17ce06b6e7): Remove the Flink Configuration out of KubernetesResource. - [8d6e520](https://github.com/apache/flink/commit/8d6e5201b336a6238923292f96b9f0563a4f9029): Introduce some dedicated Kubernetes parameters parsing tools. - [c41a9a2](https://github.com/apache/flink/commit/c41a9a2b5a5f23b820cae038a0611d2d071c4ce9) to [23ed312](https://github.com/apache/flink/commit/23ed31201c0cd08d6bdb715721bba857ead2b520): Introduce the new Kubernetes decorator design pattern. - [710984c](https://github.com/apache/flink/commit/710984c24f05169a2c1e644676e05dbc573e6c3a): Rework the FlinkKubeClient to employ the new decorator pattern. - [a47f12d](https://github.com/apache/flink/commit/a47f12d54f832485ad54f273bc5a2f4901d4dce7) to [fb57917](https://github.com/apache/flink/commit/fb57917227853d0477aa1383d399a619146d7170): Minor improvements ## Verifying this change This PR adds several test classes and many unit tests to catch most of the test branch for the newly decorator design pattern. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
