cyb70289 commented on PR #49756:
URL: https://github.com/apache/arrow/pull/49756#issuecomment-4310376121

   > But it's impressive that SVE128 is always significantly better than NEON. 
That's rather good news, given that most SVE implementations have 128-bit 
vectors. @cyb70289
   
   Interesting. It should not happen if both using equivalent simd operations.
   I tested one case `BM_UnpackBool/{Neon,Sve128}Unaligned/1/32` on an Neoverse 
N2 server, SVE shows double performance than Neon. But from profile result, 
looks Neon code does not inline frequently called functions like `load_val_as` 
and introduces high overhead.
   
   **Benchmark**
   ```
   BM_UnpackBool/NeonUnaligned/1/32              11.1 ns         11.0 ns     
63329847 items_per_second=2.89667G/s
   BM_UnpackBool/Sve128Unaligned/1/32            7.02 ns         7.02 ns     
99743767 items_per_second=4.55851G/s
   ```
   
   **Neon hotspot shows load_val_as is not inlined**
   ```
   +   45.20%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::unpack_width<1, 
arrow::internal::bpacking::KernelNeon, bool>(unsigned char const
   +   31.03%  arrow-bpacking-  libarrow.so.2500.0.0      [.] 
xsimd::batch<unsigned char, xsimd::neon64> 
arrow::internal::bpacking::load_val_as<unsigned int, xsimd::neon64>(u
   +    7.08%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::MediumKernel<arrow::internal::bpacking::KernelTraits<bool,
 1, xsimd::neon64>, ar
   +    6.38%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::MediumKernel<arrow::internal::bpacking::KernelTraits<bool,
 1, xsimd::neon64>, ar
   +    2.63%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::unpack_neon<bool>(unsigned char const*, bool*, 
arrow::internal::UnpackOptions co
   +    1.95%  arrow-bpacking-  arrow-bpacking-benchmark  [.] 
arrow::internal::(anonymous namespace)::BM_UnpackBool(benchmark::State&, bool, 
void (*)(unsigned char const*, bo
   +    1.82%  arrow-bpacking-  libarrow.so.2500.0.0      [.] 
xsimd::batch<unsigned char, xsimd::neon64> 
arrow::internal::bpacking::load_val_as<unsigned int, xsimd::neon64>(u
   +    1.37%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::unpack_width<1, 
arrow::internal::bpacking::KernelNeon, bool>(unsigned char const
   ```
   
   **No such issue in sve128 code path**
   ```
   +   89.89%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::unpack_width<1, arrow::int◆
   +    4.18%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::unpack_sve128<bool>(unsign▒
   +    3.11%  arrow-bpacking-  arrow-bpacking-benchmark  [.] 
arrow::internal::(anonymous namespace)::BM_UnpackBool(benc▒
   +    1.47%  arrow-bpacking-  libarrow.so.2500.0.0      [.] void 
arrow::internal::bpacking::unpack_width<1, arrow::int▒
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to