[I] [Feature][Core] Add slot allocation strategy [seatunnel]

via GitHub Tue, 03 Dec 2024 04:42:21 -0800


zhangshenghang opened a new issue, #8205:
URL: https://github.com/apache/seatunnel/issues/8205

### Search before asking

- [X] I had searched in the
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
and found no similar feature requirement.

### Description

Currently, our task slot allocation strategy is: Random.

We plan to add two new scheduling strategies:
1. SLOT_RATIO
2. SYSTEM_LOAD

## Detailed Plan

### SLOT_RATIO
This strategy schedules based on the usage rate of the worker's slots. Slots
with lower usage rates will have higher priority.

**Calculation Logic**:
1. Obtain the total number of worker slots.
2. Get the number of unallocated slots.
3. Usage rate = (Total slots - Unallocated slots) / Total slots.

### SYSTEM_LOAD

1. **Weight Distribution and Calculation Explanation**
- **Time Weight Design**:
The time weight distribution is `4, 2, 2, 1, 1`, and it can be
normalized to maintain consistency in the total. The weight for each time
period is calculated as:
<img width="407" alt="image"
src="https://github.com/user-attachments/assets/b5688c1e-5588-49c5-9a0e-3acabfcc6961";><br>-
The weight for the most recent time is $0.4$, $0.2$ for three minutes ago, and
so on.

- **CPU and Memory Resource Contribution**:
The CPU and memory utilization rates are combined with their
respective weights to calculate the credibility of the system resource
utilization. The formula is:

- **Time Decay Factor**:
The comprehensive resource utilization rate is multiplied by the
corresponding time weight after each calculation to obtain a time-weighted
average.

2. **Overall Scheduling Formula**
The calculation formula for the overall scheduling priority is integrated
as follows:

Where $i$ represents the $i$-th statistical value, and the time weight is
$\frac{\text{Single Weight}}{10}$.

3. **Implementation Logic**
- **Data Collection**:
- Collect CPU and memory utilization every 3 minutes, storing the
last 5 statistics.
- Each time collection binds the data to the corresponding time
weight.
- **Priority Calculation**:
- Based on the collected CPU and memory utilization, calculate the
scheduling priority for each instance using the formula.
- Use the calculated result as the core basis for load distribution.
- **Dynamic Adjustment**:
- Use a sliding window to update the most recent 5 statistics.
- Reduce the weight of older data to better adapt to the latest load
changes.

4. **Example Data Calculation**
- Assume the CPU and memory utilization rates for 5 instances are as
follows:
<img width="445" alt="image"
src="https://github.com/user-attachments/assets/fb05b517-6139-4aa7-bbef-d7cd15bdb54b";>

- The CPU and memory weight configurations are both $0.5$, and the time
weights are $[0.4, 0.2, 0.2, 0.1, 0.1]$.
- The corresponding scheduling priority is calculated as:

- The final result is the scheduling priority value, which can be used for
load distribution.

### Usage Scenario

_No response_

### Related issues

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature][Core] Add slot allocation strategy [seatunnel]

Reply via email to