lintingbin commented on issue #4055:
URL: https://github.com/apache/amoro/issues/4055#issuecomment-3797707009
Thanks @vaquarkhan for the detailed analysis! You've correctly identified
the "noisy neighbor" problem where active partitions prevent quiet ones from
being optimized.
However, I think the per-partition tracking approach might be
**overengineered** for this specific issue. Here are my concerns:
## Issues with Per-Partition Tracking:
1. **State management overhead**: For tables with many historical partitions
(e.g., daily partitions over years), we'd need to maintain a large map. Most of
these entries would be used only once or very rarely, which is wasteful.
2. **Memory/storage overhead**: The `Map<String, Long>` would grow
indefinitely unless we implement cleanup logic, adding more complexity.
## Alternative Lightweight Solution:
I propose a **table-level approach** with a simple tweak to the original
issue's suggestion:
```java
protected boolean reachMinorInterval() {
if (config.getMinorLeastInterval() < 0) {
return false;
}
long interval = planTime - lastMinorOptimizingTime;
if (interval > config.getMinorLeastInterval()) {
return true;
}
// Ensure minor optimization runs at least once per day
return isDifferentDay(lastMinorOptimizingTime, planTime);
}
```
**Key advantages:**
- ✅ Simple: No new state storage required
- ✅ Effective: Ensures quiet partitions get optimized at least once daily
- ✅ Backward compatible: No schema changes needed
- ✅ Low overhead: Minimal logic change
**How it solves the problem:**
Even if active partitions constantly reset `lastMinorOptimizingTime`, the
`isDifferentDay()` check ensures that **at least once per day**, partitions
with just 2-3 small files will have a chance to be optimized.
For most use cases, daily optimization of quiet partitions should be
sufficient. If needed, the interval can still be configured via
`minorLeastInterval` for more frequent optimizations.
What do you think? @lintingbin would this simpler approach work for your use
case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]