lintingbin opened a new issue, #4055:
URL: https://github.com/apache/amoro/issues/4055

   ### What happened?
   
   ```
     public boolean isMinorNecessary() {
       int smallFileCount = fragmentFileCount + equalityDeleteFileCount;
       return smallFileCount >= config.getMinorLeastFileCount()
           || (smallFileCount > 1 && reachMinorInterval())
           || combinePosSegmentFileCount > 0;
     }
   
     protected boolean reachMinorInterval() {
       return config.getMinorLeastInterval() >= 0
           && planTime - lastMinorOptimizingTime > 
config.getMinorLeastInterval();
     }
   ```
   If a table has some partitions with many small files and others with only 
two or three small files, the condition `(smallFileCount > 1 && 
reachMinorInterval())` for those partitions with just two or three small files 
will never evaluate to true. Consequently, these partitions will never be 
included in minor optimizations. Essentially, `reachMinorInterval` should be 
evaluated at the partition level rather than the table level.
   
   ### Affects Versions
   
   0.8.1
   
   ### What table formats are you seeing the problem on?
   
   Iceberg
   
   ### What engines are you seeing the problem on?
   
   Spark
   
   ### How to reproduce
   
   _No response_
   
   ### Relevant log output
   
   ```shell
   
   ```
   
   ### Anything else
   
   ```
   protected boolean reachMinorInterval() {
       if (config.getMinorLeastInterval() < 0) {
           return false;
       }
       
       long interval = planTime - lastMinorOptimizingTime;
       
       if (interval > config.getMinorLeastInterval()) {
           return true;
       }
       
       return isDifferentDay(lastMinorOptimizingTime, planTime);
   }
   ```
   Perhaps the reachMinorInterval can be modified to follow the aforementioned 
logic, ensuring that it evaluates to true at least once per day. This way, 
partitions with only two or three small files will also have a chance to be 
optimized.
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's Code of Conduct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to