zhangshenghang opened a new issue, #8205: URL: https://github.com/apache/seatunnel/issues/8205
### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description Currently, our task slot allocation strategy is: Random. We plan to add two new scheduling strategies: 1. SLOT_RATIO 2. SYSTEM_LOAD ## Detailed Plan ### SLOT_RATIO This strategy schedules based on the usage rate of the worker's slots. Slots with lower usage rates will have higher priority. **Calculation Logic**: 1. Obtain the total number of worker slots. 2. Get the number of unallocated slots. 3. Usage rate = (Total slots - Unallocated slots) / Total slots. ### SYSTEM_LOAD 1. **Weight Distribution and Calculation Explanation** - **Time Weight Design**: The time weight distribution is `4, 2, 2, 1, 1`, and it can be normalized to maintain consistency in the total. The weight for each time period is calculated as: <img width="407" alt="image" src="https://github.com/user-attachments/assets/b5688c1e-5588-49c5-9a0e-3acabfcc6961"><br>- The weight for the most recent time is $0.4$, $0.2$ for three minutes ago, and so on. - **CPU and Memory Resource Contribution**: The CPU and memory utilization rates are combined with their respective weights to calculate the credibility of the system resource utilization. The formula is: <img width="1066" alt="image" src="https://github.com/user-attachments/assets/a103952c-f399-4ff0-b98c-ee702e54e309"> - **Time Decay Factor**: The comprehensive resource utilization rate is multiplied by the corresponding time weight after each calculation to obtain a time-weighted average. 2. **Overall Scheduling Formula** The calculation formula for the overall scheduling priority is integrated as follows: <img width="1158" alt="image" src="https://github.com/user-attachments/assets/fde4119a-47c2-4337-9bd6-ff4f318e6ea2"> Where $i$ represents the $i$-th statistical value, and the time weight is $\frac{\text{Single Weight}}{10}$. 3. **Implementation Logic** - **Data Collection**: - Collect CPU and memory utilization every 3 minutes, storing the last 5 statistics. - Each time collection binds the data to the corresponding time weight. - **Priority Calculation**: - Based on the collected CPU and memory utilization, calculate the scheduling priority for each instance using the formula. - Use the calculated result as the core basis for load distribution. - **Dynamic Adjustment**: - Use a sliding window to update the most recent 5 statistics. - Reduce the weight of older data to better adapt to the latest load changes. 4. **Example Data Calculation** - Assume the CPU and memory utilization rates for 5 instances are as follows: <img width="445" alt="image" src="https://github.com/user-attachments/assets/fb05b517-6139-4aa7-bbef-d7cd15bdb54b"> - The CPU and memory weight configurations are both $0.5$, and the time weights are $[0.4, 0.2, 0.2, 0.1, 0.1]$. - The corresponding scheduling priority is calculated as: <img width="814" alt="image" src="https://github.com/user-attachments/assets/998a9a18-7a7e-44a4-ab4e-db0af946cd3f"> - The final result is the scheduling priority value, which can be used for load distribution. ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
