Re: [Summer-2021][DISCUSS] : About adding task group queue function 210290601

David Dai Fri, 06 Aug 2021 16:26:55 -0700

good job
I like the feature very much, please contact your mentor and get better
advice, thx




Best Regards



---------------
Apache DolphinScheduler PMC Chair
David Dai
[email protected]
Linkedin: https://www.linkedin.com/in/dailidong
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------


On Wed, Aug 4, 2021 at 5:26 PM Yin Rui <[email protected]> wrote:

> *1. the neccessity of task group queue*
> Task group queue (TGQ) can achieve cross-project and cross-process
> concurrent control of tasks, reducing resource pressure on scheduling
> system or other big data cluster.
> TGQ also support priority-based control, which ensures that important
> tasks can be executed first. Users can also execute a task compulsively,
> ignoring the TGQ.
> *2. the details about TGQ*
> TGQ is essentially a flow limiter. By managing resources, the TGQ allows
> the tasks to obtain resources from the TGQ. In this way, the resources
> obtained by multiple tasks is limited and worker node's pressure is
> avoided.
> The database optimistic lock is used to solve the thread safety problem
> in the distributed concurrent scenario.
> Note that some tasks are not TGQ bound:
> 1. The tasks that do not need to be performed by workers;
> 2. The tasks which does not belongs to any TGQ;
> 3. The tasks is forcibly started by the user.
> *2.1 init a TGQ*
> The user manually creates a TGQ. The size of the TGQ is specified by the
> user.
> *2.2 how does a TGQ works*
> Each task configured with TGQ will apply for resource from the TGQ before
> being issued to the worker. If the TGQ has no available resources, the task
> will not be delivered to the worker, and the task wil wait for the resource
> release and ressend a request to TGQ.
> *2.3 recycle resources*
> After receiving the response from the worker, TGQ will release the
> resources corresponding to the task.
> *2.4 fault tolerance*
> In the distributed architecture, the fault tolerance mechanism is
> considerable. When the worker node is offline, the tasks with fault
> tolerance mechanism running on the worker node will be re-executed by the
> master. In order to prevent the same task from repeatedly applying for
> resources, when a task succeeds in applying for resources, it should check
> whether the task is already in the TGQ, and if so, it will resend the task
> to the worker. If not, allocate resources.
>
>
> yinrui_ustb
> [email protected]
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=yinrui_ustb&uid=yinrui_ustb%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22yinrui_ustb%40163.com%22%5D>
> 签名由网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81>定制
>

Re: [Summer-2021][DISCUSS] : About adding task group queue function 210290601

Reply via email to