lintingbin opened a new issue, #4057:
URL: https://github.com/apache/amoro/issues/4057

   ## Description
   
   Currently, the optimizer may skip rewriting Avro format files even when 
optimization is triggered. This can lead to suboptimal table performance when 
Avro files exist in the table.
   
   ## Problem
   
   Avro files have different characteristics compared to other file formats 
(like Parquet or ORC) and should be rewritten to the preferred format during 
optimization to:
   - Ensure consistent file format across the table
   - Improve query performance 
   - Maintain better compression and encoding
   
   ## Proposed Solution
   
   Add logic to force rewrite Avro files during optimization regardless of 
other conditions. The changes include:
   
   1. Add a `hasAvroFile` flag in `CommonPartitionEvaluator` to track if any 
Avro files exist
   2. Check file format using `ContentFiles.isAvroFile()` method
   3. Always mark Avro files for rewriting in both full and partial 
optimization modes
   4. Update partition evaluation to consider Avro file presence as a necessary 
condition for optimization
   
   ## Benefits
   
   - Ensures Avro files are always converted to the preferred format
   - Improves overall table health and query performance
   - Maintains consistency in file formats across the table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to