jhorstmann opened a new issue, #1858:
URL: https://github.com/apache/arrow-rs/issues/1858
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Incremental compile times feel like they have gotten slower some time ago.
While investigating I notices that a large amount of llvm code is generated for
the comparison kernels, especially dyn and dict versions. These end up calling
the generic `BooleanArray::from_iter` method which therefore gets duplicated
for each possible iterator type.
```
$ cargo llvm-lines | head -n30
Compiling arrow v16.0.0
(/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
Finished dev [unoptimized + debuginfo] target(s) in 35.58s
Lines Copies Function name
----- ------ -------------
3156218 (100%) 73225 (100%) (TOTAL)
292166 (9.3%) 1164 (1.6%) <arrow::array::array_boolean::BooleanArray
as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
112200 (3.6%) 1545 (2.1%) core::iter::traits::iterator::Iterator::fold
92112 (2.9%) 912 (1.2%) arrow::compute::kernels::comparison::cmp_dict
85538 (2.7%) 1236 (1.7%)
<core::iter::adapters::enumerate::Enumerate<I> as
core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
84972 (2.7%) 1164 (1.6%) <arrow::array::array_boolean::BooleanArray
as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
83903 (2.7%) 1478 (2.0%)
core::iter::adapters::map::map_fold::{{closure}}
80639 (2.6%) 1533 (2.1%) core::option::Option<T>::map
80304 (2.5%) 912 (1.2%)
arrow::compute::kernels::comparison::cmp_dict::{{closure}}
78427 (2.5%) 1478 (2.0%) <core::iter::adapters::map::Map<I,F> as
core::iter::traits::iterator::Iterator>::fold
65942 (2.1%) 1236 (1.7%)
<core::iter::adapters::enumerate::Enumerate<I> as
core::iter::traits::iterator::Iterator>::fold
59070 (1.9%) 1517 (2.1%)
core::iter::traits::iterator::Iterator::for_each
39108 (1.2%) 2360 (3.2%) core::iter::adapters::map::Map<I,F>::new
38034 (1.2%) 137 (0.2%) arrow::util::trusted_len::trusted_len_unzip
37336 (1.2%) 1511 (2.1%)
core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
33218 (1.1%) 62 (0.1%) core::slice::sort::partition_in_blocks
29970 (0.9%) 81 (0.1%) arrow::compute::kernels::take::take_primitive
27904 (0.9%) 2360 (3.2%) core::iter::traits::iterator::Iterator::map
27631 (0.9%) 249 (0.3%) <core::iter::adapters::zip::Zip<A,B> as
core::iter::adapters::zip::ZipImpl<A,B>>::next
27392 (0.9%) 202 (0.3%) <core::iter::adapters::zip::Zip<A,B> as
core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
26906 (0.9%) 140 (0.2%)
arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
26764 (0.8%) 136 (0.2%)
arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
25676 (0.8%) 277 (0.4%)
core::iter::traits::iterator::Iterator::try_fold
22117 (0.7%) 257 (0.4%) <alloc::vec::Vec<T> as
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
19091 (0.6%) 215 (0.3%)
<core::iter::adapters::enumerate::Enumerate<I> as
core::iter::traits::iterator::Iterator>::next
18150 (0.6%) 415 (0.6%) core::option::Option<T>::ok_or_else
17976 (0.6%) 168 (0.2%)
arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
17430 (0.6%) 1478 (2.0%) core::iter::adapters::map::map_fold
```
Removing the dyn and dict comparison kernels gives the following number of
lines (but this is of course not a realistic option):
```
$ cargo llvm-lines | head -n30
Compiling arrow v16.0.0
(/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
Lines Copies Function name
----- ------ -------------
1737110 (100%) 41764 (100%) (TOTAL)
49753 (2.9%) 939 (2.2%) core::option::Option<T>::map
38034 (2.2%) 137 (0.3%) arrow::util::trusted_len::trusted_len_unzip
33218 (1.9%) 62 (0.1%) core::slice::sort::partition_in_blocks
30112 (1.7%) 385 (0.9%) core::iter::traits::iterator::Iterator::fold
29970 (1.7%) 81 (0.2%) arrow::compute::kernels::take::take_primitive
26906 (1.5%) 140 (0.3%)
arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
26764 (1.5%) 136 (0.3%)
arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
25676 (1.5%) 277 (0.7%)
core::iter::traits::iterator::Iterator::try_fold
22117 (1.3%) 257 (0.6%) <alloc::vec::Vec<T> as
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
19091 (1.1%) 215 (0.5%)
<core::iter::adapters::enumerate::Enumerate<I> as
core::iter::traits::iterator::Iterator>::next
18337 (1.1%) 318 (0.8%)
core::iter::adapters::map::map_fold::{{closure}}
16947 (1.0%) 318 (0.8%) <core::iter::adapters::map::Map<I,F> as
core::iter::traits::iterator::Iterator>::fold
16576 (1.0%) 64 (0.2%)
arrow::compute::kernels::cast::pack_numeric_to_dictionary
16177 (0.9%) 261 (0.6%) <alloc::vec::Vec<T,A> as
alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
15683 (0.9%) 41 (0.1%)
<arrow::array::array_string::GenericStringArray<OffsetSize> as
core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
14998 (0.9%) 838 (2.0%) core::iter::adapters::map::Map<I,F>::new
14586 (0.8%) 86 (0.2%)
arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
14551 (0.8%) 96 (0.2%)
arrow::buffer::mutable::MutableBuffer::extend_from_iter
14145 (0.8%) 77 (0.2%)
<arrow::array::array_primitive::PrimitiveArray<T> as
core::iter::traits::collect::FromIterator<Ptr>>::from_iter
13912 (0.8%) 39 (0.1%) arrow::array::array::print_long_array
13830 (0.8%) 357 (0.9%)
core::iter::traits::iterator::Iterator::for_each
13743 (0.8%) 27 (0.1%)
arrow::compute::kernels::filter::filter_primitive
13726 (0.8%) 62 (0.1%) core::slice::sort::partition
13125 (0.8%) 98 (0.2%) <core::iter::adapters::GenericShunt<I,R> as
core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
12256 (0.7%) 64 (0.2%)
arrow::array::builder::PrimitiveDictionaryBuilder<K,V>::append
11804 (0.7%) 62 (0.1%) core::slice::sort::partition_equal
11215 (0.6%) 88 (0.2%) <arrow::buffer::immutable::Buffer as
core::iter::traits::collect::FromIterator<T>>::from_iter
```
This seems to indicate that nearly half of the llvm code is caused by the
comparison kernels.
**Describe the solution you'd like**
I'm not sure whether there is a good solution for this, but wanted to
document these findings.
The kernels use macros very extensively, but replacing these with generic
functions will probably lead to the same amount of llvm lines because of
monomorphization. Maybe some of the non-generic common code could be extracted
into helper methods.
**Describe alternatives you've considered**
The `dyn` kernels could also be put behind a feature flag, but that would
again complicate the test matrix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]