[GitHub] [arrow-datafusion] ming535 commented on a diff in pull request #2823: Eliminate multi limit-offset nodes to emptyRelation

GitBox Sun, 03 Jul 2022 17:22:51 -0700


ming535 commented on code in PR #2823:
URL: https://github.com/apache/arrow-datafusion/pull/2823#discussion_r912374937



##########
datafusion/optimizer/src/eliminate_limit.rs:
##########
@@ -35,35 +35,80 @@ impl EliminateLimit {
     }
 }
 
+fn eliminate_limit(
+    _optimizer: &EliminateLimit,
+    upper_offset: usize,
+    plan: &LogicalPlan,
+    _optimizer_config: &OptimizerConfig,
+) -> Result<LogicalPlan> {
+    match plan {
+        LogicalPlan::Limit(Limit {
+            skip: offset,
+            fetch: limit,
+            input,
+            ..
+        }) => {
+            // If upper's offset is equal or greater than current's limit,

Review Comment:
   // If ancestor's `offset`/`skip` is equal or greater than current's `fetch`,
   // replaces the current plan with an `LogicalPlan::EmptyRelation`.
   // For example, in the following query, the subquery `test_b` should be 
optimized as an empty table: SELECT * FROM (SELECT * FROM test_a LIMIT 5) AS 
test_b LIMIT 2 OFFSET 5;
   



##########
datafusion/optimizer/src/eliminate_limit.rs:
##########
@@ -35,35 +35,80 @@ impl EliminateLimit {
     }
 }
 
+fn eliminate_limit(
+    _optimizer: &EliminateLimit,
+    upper_offset: usize,

Review Comment:
   maybe we can use `ancestor_skip` instead of `upper_offset` to make it 
consistent with code in `limit_push_down`



##########
datafusion/optimizer/src/eliminate_limit.rs:
##########
@@ -35,35 +35,80 @@ impl EliminateLimit {
     }
 }
 

Review Comment:
   I think we should also add some comments in top this file to clarify the 
intention of this optimizer rule in line 18 and line 27.
   ```
   //! Optimizer rule to replace...
   ```



##########
datafusion/optimizer/src/eliminate_limit.rs:
##########
@@ -35,35 +35,80 @@ impl EliminateLimit {
     }
 }
 
+fn eliminate_limit(
+    _optimizer: &EliminateLimit,
+    upper_offset: usize,
+    plan: &LogicalPlan,
+    _optimizer_config: &OptimizerConfig,
+) -> Result<LogicalPlan> {
+    match plan {
+        LogicalPlan::Limit(Limit {
+            skip: offset,
+            fetch: limit,
+            input,
+            ..
+        }) => {
+            // If upper's offset is equal or greater than current's limit,
+            // replaces with an [LogicalPlan::EmptyRelation].
+            // For such query: select * from (select * from xxx limit 5) a 
limit 2 offset 5;
+            match limit {
+                Some(limit) => {
+                    if *limit == 0 || upper_offset >= *limit {
+                        return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
+                            produce_one_row: false,
+                            schema: input.schema().clone(),
+                        }));
+                    }
+                }
+                None => {}
+            }
+
+            let offset = match offset {
+                Some(offset) => *offset,
+                None => 0,
+            };
+
+            let expr = plan.expressions();
+
+            // apply the optimization to all inputs of the plan
+            let inputs = plan.inputs();
+            let new_inputs = inputs
+                .iter()
+                .map(|plan| eliminate_limit(_optimizer, offset, plan, 
_optimizer_config))
+                .collect::<Result<Vec<_>>>()?;
+
+            from_plan(plan, &expr, &new_inputs)
+        }
+        // Rest: recurse and find possible LIMIT 0/Multi LIMIT OFFSET nodes
+        _ => {
+            // For those plans(projection/sort/..) which do not affect the 
output rows of sub-plans, we still use upper_offset;
+            // otherwise, use 0 instead.
+            let offset = match plan {

Review Comment:
   Consider this case: a `Join` has an input node that has `limit 10`, and the 
ancestor of `Join` has `offset 11`. If the optimizer is allowed to optimize in 
this situation, then the result might be wrong.



##########
datafusion/optimizer/src/eliminate_limit.rs:
##########
@@ -35,35 +35,80 @@ impl EliminateLimit {
     }
 }
 
+fn eliminate_limit(
+    _optimizer: &EliminateLimit,
+    upper_offset: usize,
+    plan: &LogicalPlan,
+    _optimizer_config: &OptimizerConfig,
+) -> Result<LogicalPlan> {
+    match plan {
+        LogicalPlan::Limit(Limit {
+            skip: offset,
+            fetch: limit,
+            input,

Review Comment:
   may be we should keep use `skip` and `fetch` instead of `offset` and `limit` 
to keep it consistent with code in `limit_push_down`



##########
datafusion/optimizer/src/eliminate_limit.rs:
##########
@@ -35,35 +35,80 @@ impl EliminateLimit {
     }
 }
 
+fn eliminate_limit(
+    _optimizer: &EliminateLimit,
+    upper_offset: usize,
+    plan: &LogicalPlan,
+    _optimizer_config: &OptimizerConfig,
+) -> Result<LogicalPlan> {
+    match plan {
+        LogicalPlan::Limit(Limit {
+            skip: offset,
+            fetch: limit,
+            input,
+            ..
+        }) => {
+            // If upper's offset is equal or greater than current's limit,
+            // replaces with an [LogicalPlan::EmptyRelation].
+            // For such query: select * from (select * from xxx limit 5) a 
limit 2 offset 5;
+            match limit {
+                Some(limit) => {
+                    if *limit == 0 || upper_offset >= *limit {
+                        return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
+                            produce_one_row: false,
+                            schema: input.schema().clone(),
+                        }));
+                    }
+                }
+                None => {}
+            }
+
+            let offset = match offset {

Review Comment:
   you may want to use this `let skip = skip.unwrap_or(0)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ming535 commented on a diff in pull request #2823: Eliminate multi limit-offset nodes to emptyRelation

Reply via email to