alamb opened a new issue, #7988:
URL: https://github.com/apache/arrow-datafusion/issues/7988
### Describe the bug
There is a significant amount of code generated for array functions.
This both bloats binaries built with DataFusion as well as makes compile
times slow.
### To Reproduce
```shell
cd datafusion/datafusion-cli
cargo bloat
```
```
File .text Size Crate Name
0.1% 0.2% 151.2KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_replace_all
0.1% 0.2% 151.2KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_replace_n
0.1% 0.2% 151.2KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_replace
0.1% 0.2% 150.3KiB parquet
brotli::enc::prior_eval::PriorEval<Alloc>::update_cost_base
0.1% 0.2% 124.6KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_repeat
0.1% 0.2% 121.5KiB blake2
blake2::Blake2bVarCore::compress
0.0% 0.1% 81.5KiB blake2
blake2::Blake2sVarCore::compress
0.0% 0.1% 73.2KiB blake3
blake3::portable::compress_in_place
0.0% 0.1% 65.2KiB chrono_tz <chrono_tz::timezones::Tz as
chrono_tz::timezone_impl::TimeSpans>::timespans
0.0% 0.1% 61.1KiB sqlparser <sqlparser::ast::Statement as
core::fmt::Display>::fmt
0.0% 0.1% 61.0KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_append
0.0% 0.1% 61.0KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_prepend
0.0% 0.1% 60.5KiB h2
h2::codec::framed_read::decode_frame
0.0% 0.1% 59.3KiB datafusion
datafusion::physical_planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
0.0% 0.1% 56.4KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_remove_all
0.0% 0.1% 56.4KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_remove_n
0.0% 0.1% 56.4KiB datafusion_physical_expr
datafusion_physical_expr::array_expressions::array_remove
0.0% 0.1% 52.4KiB h2
h2::frame::headers::HeaderBlock::load::{{closure}}
0.0% 0.1% 51.4KiB datafusion_optimizer
<datafusion_optimizer::simplify_expressions::expr_simplifier::Simplifier<S> as
datafusion_common::tree_node::...
0.0% 0.1% 48.9KiB datafusion_physical_expr
datafusion_physical_expr::datetime_expressions::date_part
35.4% 97.6% 67.1MiB And 290367 smaller methods.
Use -n N to show more.
36.3% 100.0% 68.7MiB .text section size, the file
size is 189.3MiB
```
### Expected behavior
I would like the `array_replace_all`, `array_replace_n`, `array_replace`
functions to be implemented in terms of arrow kernels (such as `eq`, and
`take`) and manipulations of offset buffers rather than directly creating new
lists.
For example, the large macro expansion here:
https://github.com/apache/arrow-datafusion/blob/bb1d7f9343532d5fa8df871ff42000fbe836d7d7/datafusion/physical-expr/src/array_expressions.rs#L1431-L1437
I believe generates a bunch of specialized code for each different list
element data type 😢
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]