Shekharrajak opened a new issue, #16430: URL: https://github.com/apache/iceberg/issues/16430
### Feature Request / Improvement SparkBatchQueryScan stores sort_order_id per file in manifests but never implements SupportsReportOrdering, so BatchScanExec.outputOrdering always returns Nil We can implement SupportsReportOrdering in SparkBatchQueryScan. Return the table's current SortOrder (converted via SortOrderToSpark) when all planned FileScanTasks share the same non-zero sort_order_id This will benefit by Eliminating pre-sort in sort-merge joins, ordered aggregations, and MOR compaction reads when the table has a defined sort order and all files are sorted consistently. ``` CREATE TABLE db.events (user_id BIGINT, event_time TIMESTAMP) USING iceberg WRITE ORDERED BY event_time; INSERT INTO db.events SELECT * FROM source; EXPLAIN SELECT * FROM db.events ORDER BY event_time; -- Today: Sort[event_time] → BatchScanExec (outputOrdering=Nil) -- After: BatchScanExec (outputOrdering=[event_time ASC]) — Sort eliminated ``` ### Query engine Spark ### Willingness to contribute - [x] I can contribute this improvement/feature independently - [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
