Sorry for not attach img
On Mon, Dec 9, 2019 at 7:19 PM guo jiwei <[email protected]> wrote:
>
> Hello everyone ,
>
> I would like to share some ideas about refactoring
> WorkerServer/MasterServer for dolphin-scheduler.
>
> Background
> For current implement of dolphin-scheduler, task info are stored in
> zookeeper , and worker-server is using zookeeper lock to keep executing
> task continuously. Each worker will try to acquire lock , and if it gets
> the lock, it has the ability to execute the task (fetch task from zk, get
> task info from db, and etc), or it has to wait for the lock . This is not a
> nice way, for performance, dependance or parallelism.
>
> Proposal
> I suggest worker-server execute task in a way like listening tcp port,
> and receive task command via rpc request instead. In this way,
> worker-server is not using lock or connecting db anymore. And it can
> execute task concurrently.
>
> General Implementation
> 1. Refactor worker-server as a tcp server listening some port using
> Netty for tcp communication. And we define our own binary protocol for
> scheduling.
> 2. After starting worker-server, register itself in zookeeper ,
> ephemeral node like
> /dolphinscheduler/nodes/worker/test/xxx.xxx.xxx.xxx:9800.
> {/dolphinscheduler/nodes/worker/$workerGroup/ip:port}
> 3. MasterServer has take the responsibility for trigger the task, and
> choose worker-server to execute it 。
> - first, we have to listen for worker nodes in zookeeper. And cache
> the worker list in memory .
> - second, when MasterScheduleThread acquire the lock to execute
> command(t_ds_command) , it will extract the task info process instance,
> then establish a tcp connection to target available worker using task info
> , and send the command to worker.
> 4. When worker-server receives a task command , it will deserialize the
> command into a Task, and execute in thread pool or subprocess.
> 5. After worker-server executes the task , it has to report the result
> using the pre-connected socket to MasterServer。but if the socket is closed
> in any way, WorkerServer has to connect to any other MasterServer to report
> the result.
>
>
> More detail for MasterServer/WorkerServer can be discuss in later mail
>
> Simple graph
>
> [image: image.png]
>
>
>
>
> Tboy
> [email protected]
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Tboy&uid=technotboy%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22technoboy%40apache.org%22%5D>
>