2010YOUY01 commented on code in PR #21479:
URL: https://github.com/apache/datafusion/pull/21479#discussion_r3068890632


##########
datafusion/physical-optimizer/src/optimizer.rs:
##########
@@ -125,6 +126,11 @@ impl PhysicalOptimizer {
             Arc::new(EnforceSorting::new()),
             // Run once after the local sorting requirement is changed
             Arc::new(OptimizeAggregateOrder::new()),
+            // WindowTopN: replaces Filter(rn<=K) → Window(ROW_NUMBER) → Sort
+            // with Window(ROW_NUMBER) → PartitionedTopKExec(fetch=K).
+            // Must run after EnforceSorting (which inserts SortExec) and 
before

Review Comment:
   An alternative is to move this rule before `EnforceSorting`, I think this 
can make the implementation simpler.
   
   This is optional, possibly try as follow-up for simplification, since I 
might have missed something though.
   
   See the detailed rationale:
   
   ## Background on window planning
   The initial physical plan of window function doesn't include `SortExec` and 
`RepartitionExec`, it simply declare the required sort/partitioning inside 
window operator, and rely on later `EnforceSorting` and `EnforceDistribution` 
physical optimizer rule to insert the `SortExec`. You can use the below script 
to check in `datafusion-cli`
   
   ```sql
   CREATE TABLE t (
     id INT,
     ts INT
   );
   
   INSERT INTO t VALUES
     (1, 10),
     (1, 20),
     (1, 30),
     (2, 15),
     (2, 25);
   
   EXPLAIN VERBOSE SELECT
     id,
     ts,
     ROW_NUMBER() OVER (
       PARTITION BY id
       ORDER BY ts
     ) AS rn
   FROM t
   QUALIFY rn < 3;
   ```
   
   ## Idea
   Move the rewrite rule before `EnforceSorting`, so we don't have to match 
`SortExec`, the plan pattern matching is likely to be simpler.
   The physical plan sanitizer still checks the ordering invariants to ensure 
things are still correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to