On 2018-07-26 09:25:40 [+0200], Ard Biesheuvel wrote:
> Thanks a lot.
> 
> So 20 us ~= 20,000 cycles on my 1 GHz Cortex-A53, and if I am
> understanding you correctly, you wouldn't mind the quantum of work to
> be in the order 16,000 cycles or even substantially more?

I have currently that one box and it does not seem to be a problem. So
it reports now on idle around 20us max. So if add "only" 20us to NEON /
your preempt-disable section then we may end up at 20+20 = 40us.
At this point I am not sure how "bad" it is. It works, it does not seem
that much and you can disable it if you don't want the extra 20us here.

> That is good news, but it is also rather interesting, given that these
> algorithms run at ~4 cycles per byte, meaning that you'd manage an
> entire 4 KB page without ever yielding. (GCM is used on network
> packets, XTS on disk sectors which are all smaller than that)
> 
> Do you remember how you found out NEON use is a problem for -rt on
> arm64 in the first place? Which algorithm did you test at the time to
> arrive at this conclusion?

I *think* that yield got in there by chance. The main problem was back
at the time that within the neon begin/end section there was the scatter
list walk. That walk may invoke kmap() / kmalloc() / kfree() and is not
allowed on RT within a preempt-disable section. This was my main
concern.

> Note that AES-GCM using ordinary SIMD instructions runs at 29 cpb, and
> plain AES at ~20 (on A53), so perhaps it would make sense to
> distinguish between algos using crypto instructions and ones using
> plain SIMD.

I was looking at AES-CE and AES-NEON (aes-neon-blk / aes_ce_blk) with
        modprobe tcrypt mode=200 sec=1

and mode=403 +404 for the sha1/256 test.

Sebastian

Reply via email to