zhuqi-lucas commented on code in PR #21956:
URL: https://github.com/apache/datafusion/pull/21956#discussion_r3246405372


##########
datafusion/datasource-parquet/src/source.rs:
##########
@@ -815,17 +975,53 @@ impl FileSource for ParquetSource {
             new
         };
 
-        // Check if the reversed ordering satisfies the requested ordering
-        if !reversed_eq_properties.ordering_satisfy(order.iter().cloned())? {
-            return Ok(SortOrderPushdownResult::Unsupported);
+        // Check if the reversed ordering satisfies the requested ordering.
+        // If yes, this is the "Inexact via row-group reversal" case: source
+        // declares ASC ordering, request is DESC (or vice versa), so iterating
+        // RGs in reverse approximates the requested order.
+        if reversed_eq_properties.ordering_satisfy(order.iter().cloned())? {
+            let sort_order = LexOrdering::new(order.iter().cloned());
+            let mut new_source = self.clone().with_reverse_row_groups(true);
+            new_source.sort_order_for_reorder = sort_order;
+            return Ok(SortOrderPushdownResult::Inexact {
+                inner: Arc::new(new_source) as Arc<dyn FileSource>,
+            });

Review Comment:
   Took a closer look while refactoring — partially merged, but kept a narrow 
fallback. Rationale spelled out in the new doc on `try_pushdown_sort`:
   
   - **Plain `Column` leading sort**: merged into the stats-based path as you 
suggested. This branch now sets both `sort_order_for_reorder` and 
`reverse_row_groups`, with the latter taken from the request's direction (which 
also fixes a latent source-DESC + request-ASC mismatch).
   - **Function-wrapped sort** (e.g. `date_trunc('day', ts) DESC`, `ceil(value) 
DESC`): retained as a reverse-only branch. `reorder_by_statistics` calls 
`StatisticsConverter::try_new(column_name, ...)` — parquet stores min/max keyed 
by physical column name, so it fundamentally needs a plain `Column` reference. 
The branches encode two genuinely different runtime operations:
   
     | | `reorder_by_statistics` | `reverse` |
     |---|---|---|
     | Cost | O(n log n) + metadata reads | O(n) iteration flip |
     | Needs parquet stats | yes | no |
     | Sort-expression constraint | leading must map to a physical column | 
none |
   
     So even if we extended `reorder_by_statistics` to multi-column or to 
monotonic function wrappers (good follow-up — currently it only uses 
`sort_order.first()` and only handles `Column`), the two would still need to 
coexist for the cases where one operation is available and the other isn't.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to