Hi, everyone, are there any additional suggestions for this improvement?

If not, I have submitted a pr for this improvement, please refer to 
https://github.com/apache/incubator-dolphinscheduler/pull/4996

Thank you all!

Best Regards

--
DolphinScheduler(Incubator) Contributor
Shiwen Cheng 程世文
Mobile: (+86)15201523580
Email: [email protected]



------------------------------------------------------------------
发件人:lidong dai <[email protected]>
发送时间:2021年3月6日(星期六) 21:48
收件人:dev <[email protected]>; 程世文 <[email protected]>
主 题:Re: Worker Load Balance Improvement

hi 
  I take your point,  maybe a new class called “ServerInfo”?


Best Regards
---------------
DolphinScheduler(Incubator) PPMC
Lidong Dai 
[email protected]
---------------

On Sat, Mar 6, 2021 at 8:35 PM 程世文 <[email protected]> wrote:
Hello everyone!

 I found a problem and give a solution about DS. Expect everyone's advices. 
Thank you all!


 Describe the question

 The worker load balance solution in the dev branch is a good feature, and it's 
based on the `weight` and `start time` of the worker.

 - `weight` is configured by `worker.weight`
 - `start time` is set when the worker is registered to zookeeper

 The zookeeper registration path of the worker is 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>:<weight>:<startTime>`, for 
example 
`/dolphinscheduler/nodes/worker/default/198.18.0.1:1234:100:1615022079945`, 
which is different from `/dolphinscheduler/nodes/worker/default/<ip>:<port>` in 
1.3.x release.

 Both of them are used in the class `RandomHostManager`, 
`RoundRobinHostManager` and `RoundRobinHostManager` to calculate the weight of 
the worker and select the best worker to dispatch task.

 However, because the `weight` and `start time` are placed in the zookeeper 
registration path of the worker, some problems are introduced:

 - There will be problems in all places that depend on or refer to the 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>` path as follows. 
Furthermore, we need more work to fix these problems:
   - worker fault tolerance #4757
   - worker `unRegistry` 
   - worker `handleDeadServer`
   - make confusing as follows:
 Picture 1:
 
![image](https://user-images.githubusercontent.com/4902714/110206106-d5243680-7eb6-11eb-8493-2685c1c9f9fe.png)
 Picture 2:
 
![image](https://user-images.githubusercontent.com/4902714/110206102-d2294600-7eb6-11eb-9084-552c48e79e0b.png)
 - The design of the class `Host` ([source 
code](https://github.com/apache/incubator-dolphinscheduler/blob/dev/dolphinscheduler-remote/src/main/java/org/apache/dolphinscheduler/remote/utils/Host.java))
 is unreasonable. The attribute `weight`, `startTime`, and `workGroup` should 
not be placed in this class, which will cause misuse or even potential bugs.

 Improvement Solution

 - Still use the same registration path 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>` in 1.3.x release, so all 
of the above mentioned and many potential problems can be avoided
 - Place `weight` into the znode data of 
`/dolphinscheduler/nodes/worker/default/<ip>:<port>`, and just keep the 
compatibility with the 1.3.x version
 - `startTime` is already included in the znode data, and just read it.
 - Remove the attribute `weight`, `startTime`, and `workGroup` in the class 
`Host`, maybe introduce a new class to process these attributes. This will 
avoid misuse of the class `Host`

 Which version of DolphinScheduler
  -[dev]

 Related issue: https://github.com/apache/incubator-dolphinscheduler/issues/4984

 Best Regards

 --
 DolphinScheduler(Incubator) Contributor
 Shiwen Cheng 程世文
 Mobile: (+86)15201523580
 Email: [email protected]


Reply via email to