lintingbin opened a new pull request, #4058:
URL: https://github.com/apache/amoro/pull/4058
## Summary
This PR implements functionality to force rewrite Avro format files during
table optimization.
Fixes #4057
## Changes
This PR includes the following changes:
1. **CommonPartitionEvaluator.java**
- Added `hasAvroFile` flag to track Avro file presence
- Updated `addFile()` to detect and flag Avro files
- Modified `fileShouldFullOptimizing()` to always rewrite Avro files
- Updated `fileShouldRewrite()` to prioritize Avro files for rewriting
- Enhanced `isNecessary()` to consider Avro files as a trigger for
optimization
2. **IcebergPartitionPlan.java**
- Updated task validation logic to avoid skipping single Avro file
optimization
3. **ContentFiles.java**
- Added `isAvroFile()` utility method to identify Avro format files
## Motivation
Avro files have different characteristics compared to columnar formats like
Parquet or ORC. To maintain optimal table performance and consistency, Avro
files should always be rewritten to the preferred format during optimization,
regardless of other optimization conditions.
## Testing
- Verified that Avro files are correctly identified
- Confirmed that optimization is triggered when Avro files are present
- Tested that Avro files are always included in rewrite operations
## Checklist
- [x] Code changes are complete
- [x] Changes maintain backward compatibility
- [x] Code follows project conventions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]