good! Leonard(Lifeng Nie) <nielif...@apache.org> 于2024年12月4日周三 10:56写道:
> The design looks good to me. > > But the picture you provided doesn't seem to display properly. > > Jia Fan <fanjia1...@gmail.com> 于2024年12月4日周三 09:45写道: > > > Thanks shenghang! > > The design looks good to me. > > > > zhangshenghang <shengh...@apache.org> 于2024年12月3日周二 20:52写道: > > > > > Hi Seatunnel member, > > > > > > I would like to discuss the optimization plan for the Seatunnel engine > > > task scheduling strategy: > > > > > > Currently, our task slot allocation strategy is: Random. > > > > > > We plan to add two new scheduling strategies: > > > > > > 1. > > > > > > SLOT_RATIO > > > 2. > > > > > > SYSTEM_LOAD > > > > > > Detailed PlanSLOT_RATIO > > > > > > This strategy schedules based on the usage rate of the worker's slots. > > > Slots with lower usage rates will have higher priority. > > > > > > *Calculation Logic*: > > > > > > 1. > > > > > > Obtain the total number of worker slots. > > > 2. > > > > > > Get the number of unallocated slots. > > > 3. > > > > > > Usage rate = (Total slots - Unallocated slots) / Total slots. > > > > > > SYSTEM_LOAD > > > > > > *Weight Distribution and Calculation Explanation* > > > > > > - > > > > > > *Time Weight Design*: The time weight distribution is 4, 2, 2, 1, 1, > > > and it can be normalized to maintain consistency in the total. The > > weight > > > for each time period is calculated as: > > > [image: image.png] > > > > > > > > > - > > > > > > The weight for the most recent time is 0.4, 0.2 for three minutes > > > ago, and so on. > > > - > > > > > > *CPU and Memory Resource Contribution*: The CPU and memory > utilization > > > rates are combined with their respective weights to calculate the > > > credibility of the system resource utilization. The formula is: > > > [image: image.png] > > > > > > - > > > > > > *Time Decay Factor*: The comprehensive resource utilization rate is > > > multiplied by the corresponding time weight after each calculation > to > > > obtain a time-weighted average. > > > > > > *Overall Scheduling Formula* The calculation formula for the overall > > > scheduling priority is integrated as follows: > > > > > > [image: image.png] > > > [image: image.png] > > > *Implementation Logic* > > > > > > - > > > > > > *Data Collection*: > > > - > > > > > > Collect CPU and memory utilization every 3 minutes, storing the > > > last 5 statistics. > > > - > > > > > > Each time collection binds the data to the corresponding time > > > weight. > > > - > > > > > > *Priority Calculation*: > > > - > > > > > > Based on the collected CPU and memory utilization, calculate the > > > scheduling priority for each instance using the formula. > > > - > > > > > > Use the calculated result as the core basis for load > distribution. > > > - > > > > > > *Dynamic Adjustment*: > > > - > > > > > > Use a sliding window to update the most recent 5 statistics. > > > - > > > > > > Reduce the weight of older data to better adapt to the latest > load > > > changes. > > > > > > *Example Data Calculation* > > > > > > - > > > > > > Assume the CPU and memory utilization rates for 5 instances are as > > > follows: > > > [image: image.png] > > > - > > > > > > The CPU and memory weight configurations are both 0.5, and the time > > > weights are [0.4, 0.2, 0.2, 0.1, 0.1]. > > > - > > > > > > The corresponding scheduling priority is calculated as: > > > > > > [image: image.png] > > > > > > - > > > > > > The final result is the scheduling priority value, which can be used > > > for load distribution. > > > > > > Looking forward to your suggestions. > > > > > > You can also discuss it in the issue: > > > https://github.com/apache/seatunnel/issues/8205 > > > > > > > > > > > > Regards, > > > Jast (Shenghang) > > > > > > > > -- > Warm Regards, > > Leonard(LiFeng Nie) >