Hello everyone! I found a problem and give a solution about DS. Expect everyone's advices. Thank you all!
Describe the question The worker load balance solution in the dev branch is a good feature, and it's based on the `weight` and `start time` of the worker. - `weight` is configured by `worker.weight` - `start time` is set when the worker is registered to zookeeper The zookeeper registration path of the worker is `/dolphinscheduler/nodes/worker/default/<ip>:<port>:<weight>:<startTime>`, for example `/dolphinscheduler/nodes/worker/default/198.18.0.1:1234:100:1615022079945`, which is different from `/dolphinscheduler/nodes/worker/default/<ip>:<port>` in 1.3.x release. Both of them are used in the class `RandomHostManager`, `RoundRobinHostManager` and `RoundRobinHostManager` to calculate the weight of the worker and select the best worker to dispatch task. However, because the `weight` and `start time` are placed in the zookeeper registration path of the worker, some problems are introduced: - There will be problems in all places that depend on or refer to the `/dolphinscheduler/nodes/worker/default/<ip>:<port>` path as follows. Furthermore, we need more work to fix these problems: - worker fault tolerance #4757 - worker `unRegistry` - worker `handleDeadServer` - make confusing as follows: Picture 1:  Picture 2:  - The design of the class `Host` ([source code](https://github.com/apache/incubator-dolphinscheduler/blob/dev/dolphinscheduler-remote/src/main/java/org/apache/dolphinscheduler/remote/utils/Host.java)) is unreasonable. The attribute `weight`, `startTime`, and `workGroup` should not be placed in this class, which will cause misuse or even potential bugs. Improvement Solution - Still use the same registration path `/dolphinscheduler/nodes/worker/default/<ip>:<port>` in 1.3.x release, so all of the above mentioned and many potential problems can be avoided - Place `weight` into the znode data of `/dolphinscheduler/nodes/worker/default/<ip>:<port>`, and just keep the compatibility with the 1.3.x version - `startTime` is already included in the znode data, and just read it. - Remove the attribute `weight`, `startTime`, and `workGroup` in the class `Host`, maybe introduce a new class to process these attributes. This will avoid misuse of the class `Host` Which version of DolphinScheduler -[dev] Related issue: https://github.com/apache/incubator-dolphinscheduler/issues/4984 Best Regards -- DolphinScheduler(Incubator) Contributor Shiwen Cheng 程世文 Mobile: (+86)15201523580 Email: [email protected]
