lintingbin opened a new issue, #4057: URL: https://github.com/apache/amoro/issues/4057
## Description Currently, the optimizer may skip rewriting Avro format files even when optimization is triggered. This can lead to suboptimal table performance when Avro files exist in the table. ## Problem Avro files have different characteristics compared to other file formats (like Parquet or ORC) and should be rewritten to the preferred format during optimization to: - Ensure consistent file format across the table - Improve query performance - Maintain better compression and encoding ## Proposed Solution Add logic to force rewrite Avro files during optimization regardless of other conditions. The changes include: 1. Add a `hasAvroFile` flag in `CommonPartitionEvaluator` to track if any Avro files exist 2. Check file format using `ContentFiles.isAvroFile()` method 3. Always mark Avro files for rewriting in both full and partial optimization modes 4. Update partition evaluation to consider Avro file presence as a necessary condition for optimization ## Benefits - Ensures Avro files are always converted to the preferred format - Improves overall table health and query performance - Maintains consistency in file formats across the table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
