lintingbin commented on issue #4055:
URL: https://github.com/apache/amoro/issues/4055#issuecomment-3797717717

   > Thanks [@vaquarkhan](https://github.com/vaquarkhan) for the detailed 
analysis! You've correctly identified the "noisy neighbor" problem where active 
partitions prevent quiet ones from being optimized.
   > 
   > However, I think the per-partition tracking approach might be 
**overengineered** for this specific issue. Here are my concerns:
   > 
   > ## Issues with Per-Partition Tracking:
   > 1. **State management overhead**: For tables with many historical 
partitions (e.g., daily partitions over years), we'd need to maintain a large 
map. Most of these entries would be used only once or very rarely, which is 
wasteful.
   > 2. **Memory/storage overhead**: The `Map<String, Long>` would grow 
indefinitely unless we implement cleanup logic, adding more complexity.
   > 
   > ## Alternative Lightweight Solution:
   > I propose a **table-level approach** with a simple tweak to the original 
issue's suggestion:
   > 
   > protected boolean reachMinorInterval() {
   >     if (config.getMinorLeastInterval() < 0) {
   >         return false;
   >     }
   >     
   >     long interval = planTime - lastMinorOptimizingTime;
   >     
   >     if (interval > config.getMinorLeastInterval()) {
   >         return true;
   >     }
   >     
   >     // Ensure minor optimization runs at least once per day
   >     return isDifferentDay(lastMinorOptimizingTime, planTime);
   > }
   > **Key advantages:**
   > 
   > * ✅ Simple: No new state storage required
   > * ✅ Effective: Ensures quiet partitions get optimized at least once daily
   > * ✅ Backward compatible: No schema changes needed
   > * ✅ Low overhead: Minimal logic change
   > 
   > **How it solves the problem:** Even if active partitions constantly reset 
`lastMinorOptimizingTime`, the `isDifferentDay()` check ensures that **at least 
once per day**, partitions with just 2-3 small files will have a chance to be 
optimized.
   > 
   > For most use cases, daily optimization of quiet partitions should be 
sufficient. If needed, the interval can still be configured via 
`minorLeastInterval` for more frequent optimizations.
   > 
   > What do you think? [@vaquarkhan](https://github.com/vaquarkhan) would this 
simpler approach work for your use case?
   
   @zhoujinsong What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to