zhangshenghang opened a new issue, #8205:
URL: https://github.com/apache/seatunnel/issues/8205

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   Currently, our task slot allocation strategy is: Random.
   
   We plan to add two new scheduling strategies:
   1. SLOT_RATIO
   2. SYSTEM_LOAD
   
   ## Detailed Plan
   
   ### SLOT_RATIO
   This strategy schedules based on the usage rate of the worker's slots. Slots 
with lower usage rates will have higher priority.
   
   **Calculation Logic**:
   1. Obtain the total number of worker slots.
   2. Get the number of unallocated slots.
   3. Usage rate = (Total slots - Unallocated slots) / Total slots.
   
   ### SYSTEM_LOAD
   
   1. **Weight Distribution and Calculation Explanation**
       - **Time Weight Design**:  
         The time weight distribution is `4, 2, 2, 1, 1`, and it can be 
normalized to maintain consistency in the total. The weight for each time 
period is calculated as:  
   <img width="407" alt="image" 
src="https://github.com/user-attachments/assets/b5688c1e-5588-49c5-9a0e-3acabfcc6961";><br>-
 The weight for the most recent time is $0.4$, $0.2$ for three minutes ago, and 
so on.
   
       - **CPU and Memory Resource Contribution**:  
         The CPU and memory utilization rates are combined with their 
respective weights to calculate the credibility of the system resource 
utilization. The formula is:  
         
   <img width="1066" alt="image" 
src="https://github.com/user-attachments/assets/a103952c-f399-4ff0-b98c-ee702e54e309";>
   
   
   - **Time Decay Factor**:  
         The comprehensive resource utilization rate is multiplied by the 
corresponding time weight after each calculation to obtain a time-weighted 
average.
   
   2. **Overall Scheduling Formula**  
      The calculation formula for the overall scheduling priority is integrated 
as follows:  
     
   <img width="1158" alt="image" 
src="https://github.com/user-attachments/assets/fde4119a-47c2-4337-9bd6-ff4f318e6ea2";>
    
      Where $i$ represents the $i$-th statistical value, and the time weight is 
$\frac{\text{Single Weight}}{10}$.
   
   3. **Implementation Logic**
       - **Data Collection**:
           - Collect CPU and memory utilization every 3 minutes, storing the 
last 5 statistics.
           - Each time collection binds the data to the corresponding time 
weight.
       - **Priority Calculation**:
           - Based on the collected CPU and memory utilization, calculate the 
scheduling priority for each instance using the formula.
           - Use the calculated result as the core basis for load distribution.
       - **Dynamic Adjustment**:
           - Use a sliding window to update the most recent 5 statistics.
           - Reduce the weight of older data to better adapt to the latest load 
changes.
   
   4. **Example Data Calculation**
   - Assume the CPU and memory utilization rates for 5 instances are as 
follows:  
   <img width="445" alt="image" 
src="https://github.com/user-attachments/assets/fb05b517-6139-4aa7-bbef-d7cd15bdb54b";>
   
   - The CPU and memory weight configurations are both $0.5$, and the time 
weights are $[0.4, 0.2, 0.2, 0.1, 0.1]$.
   - The corresponding scheduling priority is calculated as:  
         
   <img width="814" alt="image" 
src="https://github.com/user-attachments/assets/998a9a18-7a7e-44a4-ab4e-db0af946cd3f";>
   
   - The final result is the scheduling priority value, which can be used for 
load distribution.
   
   
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to