chengshiwen opened a new issue #4984:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4984


   **Describe the question**
   
   The worker load balance solution in the dev branch is a good feature, and 
it's based on the `weight` and `start time` of the worker.
   
   - `weight` is configured by `worker.weight`
   - `start time` is set when the worker is registered to zookeeper
   
   The zookeeper registration path of the worker is 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>:<weight>:<startTime>`, for 
example 
`/dolphinscheduler/nodes/worker/default/198.18.0.1:1234:100:1615022079945`, 
which is different from `/dolphinscheduler/nodes/worker/default/<ip>:<port>` in 
1.3.x release.
   
   Both of them are used in the class `RandomHostManager`, 
`RoundRobinHostManager` and `RoundRobinHostManager` to calculate the weight of 
the worker and select the best worker to dispatch task.
   
   However, because the `weight` and `start time` are placed in the zookeeper 
registration path of the worker, some problems are introduced:
   
   - There will be problems in all places that depend on or refer to the 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>` path as follows. 
Furthermore, we need more work to fix these problems:
     - worker fault tolerance #4757
     - worker `unRegistry` 
     - worker `handleDeadServer`
     - make confusing as follows:
   Picture 1:
   
![image](https://user-images.githubusercontent.com/4902714/110206106-d5243680-7eb6-11eb-8493-2685c1c9f9fe.png)
   Picture 2:
   
![image](https://user-images.githubusercontent.com/4902714/110206102-d2294600-7eb6-11eb-9084-552c48e79e0b.png)
   - The design of the class `Host` ([source 
code](https://github.com/apache/incubator-dolphinscheduler/blob/dev/dolphinscheduler-remote/src/main/java/org/apache/dolphinscheduler/remote/utils/Host.java))
 is unreasonable. The attribute `weight`, `startTime`, and `workGroup` should 
not be placed in this class, which will cause misuse or even potential bugs.
   
   **What are the current deficiencies and the benefits of improvement**
   
   - Still use the same registration path 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>` in 1.3.x release, so all 
of the above mentioned and many potential problems can be avoided
   - Place `weight` into the znode data of 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>`, and just keep the 
compatibility with the 1.3.x version
   - `startTime` is already included in the znode data, and just read it.
   - Remove the attribute `weight`, `startTime`, and `workGroup` in the class 
`Host`, maybe introduce a new class to process these attributes. This will 
avoid misuse of the class `Host`
   
   **Which version of DolphinScheduler:**
    -[dev]
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to