[ 
https://issues.apache.org/jira/browse/ARROW-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jörn Horstmann reassigned ARROW-10243:
--------------------------------------

    Assignee: Jörn Horstmann

> [Rust] [Datafusion] Optimize literal expression evaluation
> ----------------------------------------------------------
>
>                 Key: ARROW-10243
>                 URL: https://issues.apache.org/jira/browse/ARROW-10243
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>            Reporter: Jörn Horstmann
>            Assignee: Jörn Horstmann
>            Priority: Major
>         Attachments: flamegraph.svg
>
>
> While benchmarking the tpch query I noticed that the physical literal 
> expression takes up a sizable amount of time. I think the creation of the 
> corresponding array for numeric literals can be speed up by creating Buffer 
> and ArrayData directly without going through a builder. That also allows to 
> skip building a null bitmap for non-null literals.
> I'm also thinking whether it might be possible to cache the created array. 
> For queries without a WHERE clause, I'd expect all batches except the last to 
> have the same length. I'm not sure though where to store the cached value.
> Another possible optimization could be to cast literals already on the 
> logical plan side. In the tpch query the literal `1` is of type `u64` in the 
> logical plan and then needs to be processed by a cast kernel to convert to 
> `f64` for usage in an arithmetic expression.
> The attached flamegraph is of 10 runs of tpch, with the data being loaded 
> into memory before running the queries (See ARROW-10240).
> {code}
> flamegraph ./target/release/tpch --iterations 10 --path ../tpch-dbgen 
> --format tbl --query 1 --batch-size 4096 -c1 --load
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to