cyb70289 commented on pull request #11864:
URL: https://github.com/apache/arrow/pull/11864#issuecomment-997744819


   I compared performance gaps of defining std::random_device on **stack** vs 
**static thread_local** on x86 and arm with array lengths 1/64/1024/65536.
   
   There's no obvious difference on x86, but big difference on arm. The reason 
is that libstdc++ leverages intel rdrand hardware feature on x86, but has to 
read /dev/urandom on other platforms.
   
   Though the microbenchmark result, I think it's unlikely a real issue in 
practice, and `static thread_local` looks to me not a good fix. Maybe we can 
think about replacing std::random_device. [1][2]
   
   [1] https://www.pcg-random.org/posts/cpps-random_device.html
   [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087
   
   - No obvious difference on x86 (clang, libstdc++)
   ```
   on stack
   RandomKernelSystem/1           2183 ns         2183 ns       314432 
items_per_second=458.128k/s
   RandomKernelSystem/64          2403 ns         2403 ns       294665 
items_per_second=26.6355M/s
   RandomKernelSystem/1024        6588 ns         6588 ns       105727 
items_per_second=155.446M/s
   RandomKernelSystem/65536     284906 ns       284906 ns         2454 
items_per_second=230.027M/s
   
   as static thread_local
   RandomKernelSystem/1           1972 ns         1972 ns       354208 
items_per_second=506.995k/s
   RandomKernelSystem/64          2176 ns         2176 ns       319910 
items_per_second=29.4171M/s
   RandomKernelSystem/1024        6672 ns         6672 ns       104292 
items_per_second=153.474M/s
   RandomKernelSystem/65536     309043 ns       309038 ns         2265 
items_per_second=212.065M/s
   ```
   
   - Big difference on arm (clang, libstdc++)
   ```
   on stack
   RandomKernelSystem/1           7904 ns         7857 ns        83142 
items_per_second=127.277k/s
   RandomKernelSystem/64          7963 ns         7915 ns        88657 
items_per_second=8.08625M/s
   RandomKernelSystem/1024       12271 ns        12219 ns        57058 
items_per_second=83.8037M/s
   RandomKernelSystem/65536     307681 ns       307668 ns         2276 
items_per_second=213.009M/s
   
   as static thread_local
   RandomKernelSystem/1           2743 ns         2742 ns       255454 
items_per_second=364.658k/s
   RandomKernelSystem/64          2936 ns         2936 ns       238321 
items_per_second=21.799M/s 
   RandomKernelSystem/1024        7317 ns         7316 ns        95515 
items_per_second=139.959M/s
   RandomKernelSystem/65536     299196 ns       299187 ns         2337 
items_per_second=219.047M/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to