> From: Reshetova, Elena > > Sent: 03 May 2019 17:17 > ... > > rdrand (calling every 8 syscalls): Simple syscall: 0.0795 microseconds > > You could try something like: > u64 rand_val = cpu_var->syscall_rand > > while (unlikely(rand_val == 0)) > rand_val = rdrand64(); > > stack_offset = rand_val & 0xff; > rand_val >>= 6; > if (likely(rand_val >= 4)) > cpu_var->syscall_rand = rand_val; > else > cpu_var->syscall_rand = rdrand64(); > > return stack_offset; > > That gives you 10 system calls per rdrand instruction > and mostly takes the latency out of line.
I am not really happy going the rdrand path for a couple of reasons: - it is not available on older PCs - its performance varies across CPUs that support it (and as I understood varies quite some) - it is x86 centric and not generic So, if we can use get_random_bytes() interface without tightening ourselves to a particular instruction, I think it would be better. The numbers I have measured so far for buffer size of 4096 is SW only, I will try to measure today what boost (if any) we can have if we use SIMD code for it. Best Regards, Elena.

