Re: [PR] Add support for recursive CTEs [arrow-datafusion]

via GitHub Tue, 09 Jan 2024 09:30:57 -0800


matthewgapp commented on code in PR #7581:
URL: https://github.com/apache/arrow-datafusion/pull/7581#discussion_r1446401353



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -112,6 +112,8 @@ pub enum LogicalPlan {
     /// produces 0 or 1 row. This is used to implement SQL `SELECT`
     /// that has no values in the `FROM` clause.
     EmptyRelation(EmptyRelation),
+    /// A named temporary relation with a schema.
+    NamedRelation(NamedRelation),

Review Comment:
   @jonahgao, could you provide the rationale for your suggested strategy? I'm 
interested in understanding why it might be more effective than the current 
implementation. Performance is critical to our use case. And the implementation 
for recursion is very sensitive to performance considerations, as the setup for 
execution and stream management isn't amortized over all input record batches. 
Instead, it's incurred with each iteration. For instance, we've observed a 
substantial performance boost—up to 30 times faster—by eliminating certain 
intermediate nodes, like coalesce, from our plan (as evidenced in [this 
PR](https://github.com/matthewgapp/arrow-datafusion/pull/2)). I've drafted 
another PR that appears to again double the speed of execution merely by 
omitting metric collection in recursive sub-graphs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add support for recursive CTEs [arrow-datafusion]

Reply via email to