alamb opened a new issue, #13816:
URL: https://github.com/apache/datafusion/issues/13816

   ### Is your feature request related to a problem or challenge?
   
   The size of datafusion's binary has grown significantly in the last few 
releases
   
   This likely leads to higher compile times as well as larger overall binary 
size
   
   | version | size of `datafusion-cli` binary |
   |--------|--------|
   |  `main` at 57d1309ec0830738af79c1885e514e19f324b1aa | 92M |
   | `43.0.0` | 87M |
   | `42.0.0` | 83M |
   | `41.0.0` | 72M |
   | `40.0.0` | 69M | 
   | `39.0.0` | 68M | 
   
   
   The sizes are measured like this:
   
   ```shell
   git checkout version
   cd datafusion-cli
   cargo build --release
   du -h target/release/datafusion-cli
   ```
   
   Also, people such as @g3blv have noticed that the WASM build has increased 
50%:
   
https://github.com/apache/datafusion/discussions/9834#discussioncomment-11574581
   
   ### Describe the solution you'd like
   
   I would like to reduce the binary size of DataFusion if possible
   
   At least I would like to understand where the code size comes from and offer 
hints about how to reduce the size if needed
   
   ### Describe alternatives you've considered
   
   A common source of code size is templated functions (as that generates 
multiple copies of the same function(s)). 
   
   
   Here is some fascianting information from running `cargo bloat -p datafusion`
   
   ```shell
    File  .text    Size                          Crate Name
    0.1%   0.3% 79.7KiB                         blake2 
blake2::Blake2bVarCore::compress
    0.1%   0.2% 70.7KiB                         blake2 
blake2::Blake2sVarCore::compress
    0.1%   0.2% 67.1KiB                      sqlparser 
<sqlparser::ast::Statement as core::fmt::Display>::fmt
    0.1%   0.2% 61.4KiB                         blake3 _blake3_hash4_neon
    0.1%   0.2% 56.4KiB                      chrono_tz 
<chrono_tz::timezones::Tz as chrono_tz::timezone_impl::TimeSpans>::timespans
    0.1%   0.2% 44.7KiB                     arrow_cast <i64 as 
lexical_write_integer::api::ToLexical>::to_lexical
    0.1%   0.1% 42.8KiB                     arrow_cast 
arrow_cast::cast::cast_with_options
    0.0%   0.1% 35.9KiB                           rand 
<rand_chacha::chacha::ChaCha12Core as rand_core::block::BlockRngCore>::generate
    0.0%   0.1% 34.9KiB                     arrow_cast 
lexical_parse_float::slow::parse_mantissa
    0.0%   0.1% 33.1KiB                     arrow_cast 
lexical_parse_float::parse::parse_complete
    0.0%   0.1% 33.1KiB                     arrow_cast 
lexical_parse_float::parse::parse_complete
    0.0%   0.1% 29.0KiB                 regex_automata 
regex_automata::hybrid::search::find_fwd
    0.0%   0.1% 27.6KiB                         blake3 
blake3::portable::compress_in_place
    0.0%   0.1% 27.1KiB                   aho_corasick 
aho_corasick::automaton::try_find_fwd
    0.0%   0.1% 25.2KiB                      sqlparser <sqlparser::ast::Expr as 
core::fmt::Display>::fmt
    0.0%   0.1% 23.8KiB              datafusion_common 
datafusion_common::scalar::ScalarValue::iter_to_array
    0.0%   0.1% 23.7KiB              datafusion_common 
datafusion_common::scalar::ScalarValue::iter_to_array
    0.0%   0.1% 23.7KiB       datafusion_physical_expr 
datafusion_common::scalar::ScalarValue::iter_to_array
    0.0%   0.1% 23.7KiB datafusion_functions_aggregate 
datafusion_common::scalar::ScalarValue::iter_to_array
    0.0%   0.1% 22.0KiB                     arrow_cast <u64 as 
lexical_write_integer::api::ToLexical>::to_lexical
   36.7%  97.4% 27.7MiB                                And 139272 smaller 
methods. Use -n N to show more.
   37.7% 100.0% 28.4MiB                                .text section size, the 
file size is 75.4MiB
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to