cyb70289 commented on PR #49756:
URL: https://github.com/apache/arrow/pull/49756#issuecomment-4310376121
> But it's impressive that SVE128 is always significantly better than NEON.
That's rather good news, given that most SVE implementations have 128-bit
vectors. @cyb70289
Interesting. It should not happen if both using equivalent simd operations.
I tested one case `BM_UnpackBool/{Neon,Sve128}Unaligned/1/32` on an Neoverse
N2 server, SVE shows double performance than Neon. But from profile result,
looks Neon code does not inline frequently called functions like `load_val_as`
and introduces high overhead.
**Benchmark**
```
BM_UnpackBool/NeonUnaligned/1/32 11.1 ns 11.0 ns
63329847 items_per_second=2.89667G/s
BM_UnpackBool/Sve128Unaligned/1/32 7.02 ns 7.02 ns
99743767 items_per_second=4.55851G/s
```
**Neon hotspot shows load_val_as is not inlined**
```
+ 45.20% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::unpack_width<1,
arrow::internal::bpacking::KernelNeon, bool>(unsigned char const
+ 31.03% arrow-bpacking- libarrow.so.2500.0.0 [.]
xsimd::batch<unsigned char, xsimd::neon64>
arrow::internal::bpacking::load_val_as<unsigned int, xsimd::neon64>(u
+ 7.08% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::MediumKernel<arrow::internal::bpacking::KernelTraits<bool,
1, xsimd::neon64>, ar
+ 6.38% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::MediumKernel<arrow::internal::bpacking::KernelTraits<bool,
1, xsimd::neon64>, ar
+ 2.63% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::unpack_neon<bool>(unsigned char const*, bool*,
arrow::internal::UnpackOptions co
+ 1.95% arrow-bpacking- arrow-bpacking-benchmark [.]
arrow::internal::(anonymous namespace)::BM_UnpackBool(benchmark::State&, bool,
void (*)(unsigned char const*, bo
+ 1.82% arrow-bpacking- libarrow.so.2500.0.0 [.]
xsimd::batch<unsigned char, xsimd::neon64>
arrow::internal::bpacking::load_val_as<unsigned int, xsimd::neon64>(u
+ 1.37% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::unpack_width<1,
arrow::internal::bpacking::KernelNeon, bool>(unsigned char const
```
**No such issue in sve128 code path**
```
+ 89.89% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::unpack_width<1, arrow::int◆
+ 4.18% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::unpack_sve128<bool>(unsign▒
+ 3.11% arrow-bpacking- arrow-bpacking-benchmark [.]
arrow::internal::(anonymous namespace)::BM_UnpackBool(benc▒
+ 1.47% arrow-bpacking- libarrow.so.2500.0.0 [.] void
arrow::internal::bpacking::unpack_width<1, arrow::int▒
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]