alamb opened a new issue, #7988:
URL: https://github.com/apache/arrow-datafusion/issues/7988

   ### Describe the bug
   
   There is a significant amount of code generated for array functions. 
   
   This both bloats binaries built with DataFusion as well as makes compile 
times slow. 
   
   ### To Reproduce
   
   ```shell
   cd datafusion/datafusion-cli
   cargo bloat
   ```
   
   ```
    File  .text     Size                    Crate Name
    0.1%   0.2% 151.2KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_replace_all
    0.1%   0.2% 151.2KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_replace_n
    0.1%   0.2% 151.2KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_replace
    0.1%   0.2% 150.3KiB                  parquet 
brotli::enc::prior_eval::PriorEval<Alloc>::update_cost_base
    0.1%   0.2% 124.6KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_repeat
    0.1%   0.2% 121.5KiB                   blake2 
blake2::Blake2bVarCore::compress
    0.0%   0.1%  81.5KiB                   blake2 
blake2::Blake2sVarCore::compress
    0.0%   0.1%  73.2KiB                   blake3 
blake3::portable::compress_in_place
    0.0%   0.1%  65.2KiB                chrono_tz <chrono_tz::timezones::Tz as 
chrono_tz::timezone_impl::TimeSpans>::timespans
    0.0%   0.1%  61.1KiB                sqlparser <sqlparser::ast::Statement as 
core::fmt::Display>::fmt
    0.0%   0.1%  61.0KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_append
    0.0%   0.1%  61.0KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_prepend
    0.0%   0.1%  60.5KiB                       h2 
h2::codec::framed_read::decode_frame
    0.0%   0.1%  59.3KiB               datafusion 
datafusion::physical_planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
    0.0%   0.1%  56.4KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_remove_all
    0.0%   0.1%  56.4KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_remove_n
    0.0%   0.1%  56.4KiB datafusion_physical_expr 
datafusion_physical_expr::array_expressions::array_remove
    0.0%   0.1%  52.4KiB                       h2 
h2::frame::headers::HeaderBlock::load::{{closure}}
    0.0%   0.1%  51.4KiB     datafusion_optimizer 
<datafusion_optimizer::simplify_expressions::expr_simplifier::Simplifier<S> as 
datafusion_common::tree_node::...
    0.0%   0.1%  48.9KiB datafusion_physical_expr 
datafusion_physical_expr::datetime_expressions::date_part
   35.4%  97.6%  67.1MiB                          And 290367 smaller methods. 
Use -n N to show more.
   36.3% 100.0%  68.7MiB                          .text section size, the file 
size is 189.3MiB
   ```
   
   ### Expected behavior
   
   I would like the `array_replace_all`, `array_replace_n`, `array_replace` 
functions to be implemented in terms of arrow kernels (such as `eq`, and 
`take`) and manipulations of offset buffers rather than directly creating new 
lists.
   
   For example, the large macro expansion here:
   
https://github.com/apache/arrow-datafusion/blob/bb1d7f9343532d5fa8df871ff42000fbe836d7d7/datafusion/physical-expr/src/array_expressions.rs#L1431-L1437
   
   I believe generates a bunch of specialized code for each different list 
element data type 😢 
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to