Jzjsnow commented on PR #3776:
URL: https://github.com/apache/amoro/pull/3776#issuecomment-3556539851
In the latest commit, we've revamped the logic of EventBasedTrigger with key
adjustments and new configurations:
The EventBasedTrigger now includes two key parameters:
- FallbackInterval: The minimum interval for executing the original
tryEvaluatingPendingInput logic. It prevents false positives or missed triggers
from metadata metric-driven evaluation. Defaults to -1 (disabled); enabled when
set to >=0.
- MseTolerance: The tolerance threshold for partition file size MSE
(default: 0). Partitions with actual MSE below this threshold are considered
unnecessary for optimization.
When enabled, the flow now:
- Determine if `tryEvaluatingPendingInput` needs to run:
- Check if the FallbackInterval is met to trigger
`tryEvaluatingPendingInput` directly.
- Skip evaluation for empty tables.
- Skip if the condition `delete file count = 0 && avg file size > target
size * ratio` is satisfied (no need for pending input evaluation).
- Execute tryEvaluatingPendingInput if necessary:
- Use the existing scan logic to retrieve partition file information.
- Judge if each partition requires pending status: if the MSE threshold is
met, further determine the optimization type (minor/major/full).
- Update pendingInput related information.
Please take a look where you are free! @xxubai @klion26
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]