ic4y opened a new issue, #2279:
URL: https://github.com/apache/incubator-seatunnel/issues/2279

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   TaskExecutionServer is a service that executes Tasks and will run an 
instance on each node. It receives the TaskGroup from the JobMaster and runs 
the Task in it. And maintain TaskID->TaskContext, and the specific operations 
on Task are encapsulated in TaskContext. And Task holds OperationService 
internally, which means that Task can remotely call and communicate with other 
Tasks or JobMaster through OperationService.
   
   TaskGroup design :
        The tasks in a TaskGroup all run on the same node.
   <img width="432" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181298022-3cd6a80a-3c14-427c-a19b-48cbc1e7c0d8.png";>
   
   An optimization point: 
        The data channel between tasks within the same TaskGroup uses a local 
Queue. And the data channel between different TaskGroups may use a distributed 
Queue (hazalcast Ringbuffer) because it may be executed on different nodes.
   
   Task design:
           One of the most important methods of Task is call(), and the 
executor drives the operation of Task by calling call() of Task. call() will 
have a return value of ProgressState, through which the executor can determine 
whether the Task has ended and whether it needs to continue to call call(). As 
follows.
             
   <img width="301" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181298463-596825a0-3cd3-49fc-9fe4-e52ee3dcf27e.png";>
   
    Thread Share optimization: 
          Thread Share Background: In the scenario where a large number of 
small tasks are synchronized, a large number of tasks will be generated. If 
each Task is responsible for one thread, it will waste resources by running a 
large number of threads. At this time, if one thread can run multiple Tasks, 
this situation will be greatly improved. But how can one thread execute 
multiple tasks at the same time? Because the Task is internally driven by 
calling Call() again and again, a thread can call Call() of all Tasks it is 
responsible for in turn. As follows.
   
   <img width="337" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181300464-3363ef7c-b261-4f35-a5af-b44f45b52b63.png";>
   
   
   This will also bring a problem, that is, if the call() execution time of a 
task is very long. In this way, this thread will be used all the time, causing 
the delay of other tasks to be very serious.
   
   For such a problem, I temporarily think of the following two optimization 
solutions:
   
   Option1:  Marking Thread Share
   
   Provide an marking on the Task, and mark this Task to support Thread Share. 
In the specific implementation of the task, marking whether the task supports 
thread sharing. Tasks that can be shared will share a thread for execution, and 
tasks that cannot be shared will be executed exclusively by a thread.
   
    Whether the Task supports thread sharing is evaluated by the specific 
implementer of the Task. According to the execution time of the Call method, if 
the execution implementation of the Call method is all at the ms level, then 
the Task can be marked as supporting thread sharing.
   
   Option2: Dynamic Thread Share
   
   There is a fundamental problem with the above solution one, that is, the 
execution time of the Call method is often not fixed, and the Task itself is 
not very clear about the calling time of its Call() method. Because different 
stages, different amounts of data, etc. will affect the execution time of 
Call(). It is not very appropriate for such a Task to be marked as supporting 
shared threads or not. Because if a thread is marked as a shareable thread, if 
the execution time of a call to the Call method is very long, this will cause 
the delay of other tasks that share the current thread to be very high. If 
sharing is not supported, the problem of resource waste is still not solved.
   
    So the task thread sharing can be made dynamic, and a group of tasks is 
executed by a thread pool (the number of tasks >> the number of threads). 
During the execution of thread1, if the execution time of call() of Task1 
exceeds the set value (100ms), a thread thread2 will be taken out from the 
thread pool to execute the Call method of the next Task2. It is guaranteed that 
the delay of other tasks will not be too high due to the long execution time of 
Task1. When the call method of Task2 is executed normally within the timeout 
period, it will put Task2 back at the end of the task queue, and thread2 will 
continue to take out Task3 from the task queue to execute the Call method. When 
the call method of Task1 is executed, thread1 will be put back into the thread 
pool, and Task1 will be marked as timed out once. When a certain task's Call 
method executes timeout times reaches a certain limit, the task will be removed 
from the shared thread task queue, and a thread will be used exclusi
 vely.
   
   The related execution process is as follows:
   
   <img width="1117" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181309581-7b8e077e-0c83-445d-908a-a4646eaecf2e.png";>
   <img width="1124" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181309620-01277e29-bcb4-4cf4-93d9-fde0477c2bee.png";>
   <img width="1092" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181309640-a2fb6a04-b2e2-40d8-81db-9517f7f4a502.png";>
   <img width="1084" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181309673-6899c9ec-ba39-4ff2-b721-5dd9810073c9.png";>
   <img width="1078" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181309694-5561b05b-5bd9-40ca-92d5-21f88a1dc8a1.png";>
   <img width="1158" alt="image" 
src="https://user-images.githubusercontent.com/83933160/181309745-87e04a38-3a38-42ef-9e46-77d71695f28d.png";>
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to