lintingbin commented on issue #4055: URL: https://github.com/apache/amoro/issues/4055#issuecomment-3797717717
> Thanks [@vaquarkhan](https://github.com/vaquarkhan) for the detailed analysis! You've correctly identified the "noisy neighbor" problem where active partitions prevent quiet ones from being optimized. > > However, I think the per-partition tracking approach might be **overengineered** for this specific issue. Here are my concerns: > > ## Issues with Per-Partition Tracking: > 1. **State management overhead**: For tables with many historical partitions (e.g., daily partitions over years), we'd need to maintain a large map. Most of these entries would be used only once or very rarely, which is wasteful. > 2. **Memory/storage overhead**: The `Map<String, Long>` would grow indefinitely unless we implement cleanup logic, adding more complexity. > > ## Alternative Lightweight Solution: > I propose a **table-level approach** with a simple tweak to the original issue's suggestion: > > protected boolean reachMinorInterval() { > if (config.getMinorLeastInterval() < 0) { > return false; > } > > long interval = planTime - lastMinorOptimizingTime; > > if (interval > config.getMinorLeastInterval()) { > return true; > } > > // Ensure minor optimization runs at least once per day > return isDifferentDay(lastMinorOptimizingTime, planTime); > } > **Key advantages:** > > * ✅ Simple: No new state storage required > * ✅ Effective: Ensures quiet partitions get optimized at least once daily > * ✅ Backward compatible: No schema changes needed > * ✅ Low overhead: Minimal logic change > > **How it solves the problem:** Even if active partitions constantly reset `lastMinorOptimizingTime`, the `isDifferentDay()` check ensures that **at least once per day**, partitions with just 2-3 small files will have a chance to be optimized. > > For most use cases, daily optimization of quiet partitions should be sufficient. If needed, the interval can still be configured via `minorLeastInterval` for more frequent optimizations. > > What do you think? [@vaquarkhan](https://github.com/vaquarkhan) would this simpler approach work for your use case? @zhoujinsong What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
