[
https://issues.apache.org/jira/browse/ARROW-10026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197321#comment-17197321
]
Frank Du commented on ARROW-10026:
----------------------------------
Hotspot for L1 size:
{code:java}
48.92% libarrow.so.200.0.0 [.]
arrow::compute::internal::applicator::ScalarBinary<arrow::Int64Type,
arrow::Int64Type, arrow::Int64Type, arrow::compute::internal
2.81% arrow-compute-scalar-arithmetic-benchmark [.]
mpark::detail::visitation::base::make_fmatrix_impl<mpark::detail::dtor&&,
mpark::detail::base<(mpark::detail::Trait)1, decltype(n
2.66% libc-2.31.so [.] malloc
2.54% libpthread-2.31.so [.] __pthread_mutex_trylock
2.28% libarrow.so.200.0.0 [.] arrow::ArrayData::Slice
1.55% libarrow.so.200.0.0 [.]
mpark::detail::visitation::base::make_fdiagonal_impl<mpark::detail::assignment<mpark::detail::traits<decltype(nullptr),
std::shar
1.48% libarrow.so.200.0.0 [.]
arrow::compute::KernelSignature::MatchesInputs
1.39% libpthread-2.31.so [.] __pthread_mutex_unlock
1.33% libarrow.so.200.0.0 [.]
arrow::compute::detail::ScalarExecutor::Execute
1.23% libarrow_testing.so.200.0.0 [.]
std::vector<std::shared_ptr<arrow::Buffer>,
std::allocator<std::shared_ptr<arrow::Buffer> > >::~vector
{code}
Hotspot for L2 size:
{code:java}
78.21% libarrow.so.200.0.0 [.]
arrow::compute::internal::applicator::ScalarBinary<arrow::Int64Type,
arrow::Int64Type, arrow::Int64Type, arrow::compute::internal
0.93% libarrow.so.200.0.0 [.]
arrow::internal::(anonymous namespace)::BitmapOp<std::bit_and>
0.87% [kernel] [k]
mwait_idle_with_hints.constprop.0
0.82% libpthread-2.31.so [.] __pthread_mutex_trylock
0.80% arrow-compute-scalar-arithmetic-benchmark [.]
mpark::detail::visitation::base::make_fmatrix_impl<mpark::detail::dtor&&,
mpark::detail::base<(mpark::detail::Trait)1, decltype(n
0.73% libc-2.31.so [.] malloc
0.69% [kernel] [k] io_serial_out
0.64% [kernel] [k] io_serial_in
0.56% libarrow.so.200.0.0 [.]
arrow::compute::KernelSignature::MatchesInputs
0.54% libarrow.so.200.0.0 [.]
arrow::compute::detail::ScalarExecutor::Execute
0.43% libarrow.so.200.0.0 [.] arrow::ArrayData::Slice
{code}
Many additional overhead on benchmark itself for small batch.
> [C++] Improve kernel performance on small batches
> -------------------------------------------------
>
> Key: ARROW-10026
> URL: https://issues.apache.org/jira/browse/ARROW-10026
> Project: Apache Arrow
> Issue Type: Task
> Components: C++
> Reporter: Antoine Pitrou
> Priority: Major
>
> It seems that invoking some kernels on smallish batches has quite an overhead:
> {code}
> ArrayArrayKernel<Add, Int32Type>/32768/100 2860 ns
> 2859 ns 245195 bytes_per_second=10.6727G/s
> items_per_second=2.86494G/s null_percent=1 size=32.768k
> ArrayArrayKernel<Add, Int32Type>/32768/0 2752 ns
> 2751 ns 249316 bytes_per_second=11.093G/s items_per_second=2.97775G/s
> null_percent=0 size=32.768k
> ArrayArrayKernel<Add, Int32Type>/524288/100 18633 ns
> 18630 ns 36548 bytes_per_second=26.2097G/s
> items_per_second=7.03561G/s null_percent=1 size=524.288k
> ArrayArrayKernel<Add, Int32Type>/524288/0 18260 ns
> 18257 ns 38245 bytes_per_second=26.7451G/s
> items_per_second=7.17933G/s null_percent=0 size=524.288k
> {code}
> We should investigate and try to lighten the overhead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)