jhorstmann opened a new issue, #1858:
URL: https://github.com/apache/arrow-rs/issues/1858

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Incremental compile times feel like they have gotten slower some time ago. 
While investigating I notices that a large amount of llvm code is generated for 
the comparison kernels, especially dyn and dict versions. These end up calling 
the generic `BooleanArray::from_iter` method which therefore gets duplicated 
for each possible iterator type.
   
   ```
   $ cargo llvm-lines | head -n30
      Compiling arrow v16.0.0 
(/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
       Finished dev [unoptimized + debuginfo] target(s) in 35.58s
     Lines           Copies        Function name
     -----           ------        -------------
     3156218 (100%)  73225 (100%)  (TOTAL)
      292166 (9.3%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray 
as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
      112200 (3.6%)   1545 (2.1%)  core::iter::traits::iterator::Iterator::fold
       92112 (2.9%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict
       85538 (2.7%)   1236 (1.7%)  
<core::iter::adapters::enumerate::Enumerate<I> as 
core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
       84972 (2.7%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray 
as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
       83903 (2.7%)   1478 (2.0%)  
core::iter::adapters::map::map_fold::{{closure}}
       80639 (2.6%)   1533 (2.1%)  core::option::Option<T>::map
       80304 (2.5%)    912 (1.2%)  
arrow::compute::kernels::comparison::cmp_dict::{{closure}}
       78427 (2.5%)   1478 (2.0%)  <core::iter::adapters::map::Map<I,F> as 
core::iter::traits::iterator::Iterator>::fold
       65942 (2.1%)   1236 (1.7%)  
<core::iter::adapters::enumerate::Enumerate<I> as 
core::iter::traits::iterator::Iterator>::fold
       59070 (1.9%)   1517 (2.1%)  
core::iter::traits::iterator::Iterator::for_each
       39108 (1.2%)   2360 (3.2%)  core::iter::adapters::map::Map<I,F>::new
       38034 (1.2%)    137 (0.2%)  arrow::util::trusted_len::trusted_len_unzip
       37336 (1.2%)   1511 (2.1%)  
core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
       33218 (1.1%)     62 (0.1%)  core::slice::sort::partition_in_blocks
       29970 (0.9%)     81 (0.1%)  arrow::compute::kernels::take::take_primitive
       27904 (0.9%)   2360 (3.2%)  core::iter::traits::iterator::Iterator::map
       27631 (0.9%)    249 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as 
core::iter::adapters::zip::ZipImpl<A,B>>::next
       27392 (0.9%)    202 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as 
core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
       26906 (0.9%)    140 (0.2%)  
arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
       26764 (0.8%)    136 (0.2%)  
arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
       25676 (0.8%)    277 (0.4%)  
core::iter::traits::iterator::Iterator::try_fold
       22117 (0.7%)    257 (0.4%)  <alloc::vec::Vec<T> as 
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
       19091 (0.6%)    215 (0.3%)  
<core::iter::adapters::enumerate::Enumerate<I> as 
core::iter::traits::iterator::Iterator>::next
       18150 (0.6%)    415 (0.6%)  core::option::Option<T>::ok_or_else
       17976 (0.6%)    168 (0.2%)  
arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
       17430 (0.6%)   1478 (2.0%)  core::iter::adapters::map::map_fold
   ```
   
   Removing the dyn and dict comparison kernels gives the following number of 
lines (but this is of course not a realistic option):
   
   ```
   $ cargo llvm-lines | head -n30
      Compiling arrow v16.0.0 
(/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
   
     Lines           Copies        Function name
     -----           ------        -------------
     1737110 (100%)  41764 (100%)  (TOTAL)
       49753 (2.9%)    939 (2.2%)  core::option::Option<T>::map
       38034 (2.2%)    137 (0.3%)  arrow::util::trusted_len::trusted_len_unzip
       33218 (1.9%)     62 (0.1%)  core::slice::sort::partition_in_blocks
       30112 (1.7%)    385 (0.9%)  core::iter::traits::iterator::Iterator::fold
       29970 (1.7%)     81 (0.2%)  arrow::compute::kernels::take::take_primitive
       26906 (1.5%)    140 (0.3%)  
arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
       26764 (1.5%)    136 (0.3%)  
arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
       25676 (1.5%)    277 (0.7%)  
core::iter::traits::iterator::Iterator::try_fold
       22117 (1.3%)    257 (0.6%)  <alloc::vec::Vec<T> as 
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
       19091 (1.1%)    215 (0.5%)  
<core::iter::adapters::enumerate::Enumerate<I> as 
core::iter::traits::iterator::Iterator>::next
       18337 (1.1%)    318 (0.8%)  
core::iter::adapters::map::map_fold::{{closure}}
       16947 (1.0%)    318 (0.8%)  <core::iter::adapters::map::Map<I,F> as 
core::iter::traits::iterator::Iterator>::fold
       16576 (1.0%)     64 (0.2%)  
arrow::compute::kernels::cast::pack_numeric_to_dictionary
       16177 (0.9%)    261 (0.6%)  <alloc::vec::Vec<T,A> as 
alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
       15683 (0.9%)     41 (0.1%)  
<arrow::array::array_string::GenericStringArray<OffsetSize> as 
core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
       14998 (0.9%)    838 (2.0%)  core::iter::adapters::map::Map<I,F>::new
       14586 (0.8%)     86 (0.2%)  
arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
       14551 (0.8%)     96 (0.2%)  
arrow::buffer::mutable::MutableBuffer::extend_from_iter
       14145 (0.8%)     77 (0.2%)  
<arrow::array::array_primitive::PrimitiveArray<T> as 
core::iter::traits::collect::FromIterator<Ptr>>::from_iter
       13912 (0.8%)     39 (0.1%)  arrow::array::array::print_long_array
       13830 (0.8%)    357 (0.9%)  
core::iter::traits::iterator::Iterator::for_each
       13743 (0.8%)     27 (0.1%)  
arrow::compute::kernels::filter::filter_primitive
       13726 (0.8%)     62 (0.1%)  core::slice::sort::partition
       13125 (0.8%)     98 (0.2%)  <core::iter::adapters::GenericShunt<I,R> as 
core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
       12256 (0.7%)     64 (0.2%)  
arrow::array::builder::PrimitiveDictionaryBuilder<K,V>::append
       11804 (0.7%)     62 (0.1%)  core::slice::sort::partition_equal
       11215 (0.6%)     88 (0.2%)  <arrow::buffer::immutable::Buffer as 
core::iter::traits::collect::FromIterator<T>>::from_iter
   ```
   
   This seems to indicate that nearly half of the llvm code is caused by the 
comparison kernels.
   
   **Describe the solution you'd like**
   
   I'm not sure whether there is a good solution for this, but wanted to 
document these findings.
   
   The kernels use macros very extensively, but replacing these with generic 
functions will probably lead to the same amount of llvm lines because of 
monomorphization. Maybe some of the non-generic common code could be extracted 
into helper methods.
   
   **Describe alternatives you've considered**
   
   The `dyn` kernels could also be put behind a feature flag, but that would 
again complicate the test matrix.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to