kerwin-zk opened a new pull request, #8047: URL: https://github.com/apache/paimon/pull/8047
### Purpose Spark currently disables MIN/MAX aggregate pushdown for any table with `deletion-vectors.enabled=true`. This is correct but too conservative: many DV-enabled non-primary-key tables, or many splits inside them, do not actually have deleted rows. In those cases the recorded file min/max stats are still tight and can safely answer MIN/MAX. This PR makes the decision based on runtime split metadata instead of the table-level DV option. It derives whether a data file still has tight stats from `DataFileMeta.deleteRowCount` and the paired `DeletionFile.cardinality`, then allows Spark MIN/MAX pushdown only when every file in the split is tight. This keeps the existing safety behavior for files with real deletes or unknown DV cardinality, while recovering MIN/MAX pushdown for DV-enabled tables/splits that have no effective deletions. ### Tests CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
