Andrew Lamb created ARROW-11689:
-----------------------------------

             Summary: [Rust][DataFusion] Reduce copies in DataFusion 
LogicalPlan and Expr creation
                 Key: ARROW-11689
                 URL: https://issues.apache.org/jira/browse/ARROW-11689
             Project: Apache Arrow
          Issue Type: New Feature
            Reporter: Andrew Lamb


The theme of this overall epic to make the plan and expression rewriting phases 
of DataFusion more efficient by avoiding copies by leveraging the Rust type 
system

Benefits:
* More standard / idomatic Rust usage
* faster / more efficient (I don't have numbers to back this up)

Downsides:
* These will be  backwards incompatible changes


h1. Background

Many things in DataFusion  look like

Input --tranformation-->output

And the input is not used again. In rust, you can model this by giving 
ownership to the transformation

At a high level the idea is to avoid so much cloning in DataFustion

The basic principle is if the function needs to `clone` one of its arguments, 
the caller should be given the choice of when to do that. Often, the caller can 
give up ownership without issue

I envision at least the following the following items:
1. Optimizer passes that take `&LogicalPlan` and produce a new `LogicalPlan` 
even though most callsites do not need the original
2. Expr builder calls that take `&expr` and return a new `Expr`
3. An expression rewriter (TODO) while running down optimizer passes


I think this style takes advantage of Rust's ownership model and will let us 
avoid a lot o copying and allocations and avoid the need for something like 
slab allocators




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to