Vakul reports a considerable performance hit when running the accelerated
arm64 crypto routines with CONFIG_PREEMPT=y configured, now that thay have
been updated to take the TIF_NEED_RESCHED flag into account.

The issue appears to be caused by the fact that Cortex-A53, the core in
question, has a high end implementation of the Crypto Extensions, and
has a shallow pipeline, which means even sequential algorithms that may be
held back by pipeline stalls on high end out of order cores run at maximum
speed. This means SHA-1, SHA-2, GHASH and AES in GCM and CCM modes run at a
speed in the order of 2 to 4 cycles per byte, and are currently implemented
to check the TIF_NEED_RESCHED after each iteration, which may process as
little as 16 bytes (for GHASH).

Obviously, every cycle of overhead hurts in this context, and given that
the A53's load/store unit is not quite high end, any delays caused by
memory accesses that occur in the inner loop of the algorithms are going
to be quite significant, hence the performance regression.

So reduce the frequency at which the NEON yield checks are performed, so
that they occur roughly once every 1000 cycles, which is hopefully a
reasonable tradeoff between throughput and worst case scheduling latency.

Ard Biesheuvel (4):
  crypto/arm64: ghash - reduce performance impact of NEON yield checks
  crypto/arm64: aes-ccm - reduce performance impact of NEON yield checks
  crypto/arm64: sha1 - reduce performance impact of NEON yield checks
  crypto/arm64: sha2 - reduce performance impact of NEON yield checks

 arch/arm64/crypto/aes-ce-ccm-core.S |  3 +++
 arch/arm64/crypto/ghash-ce-core.S   | 12 +++++++++---
 arch/arm64/crypto/sha1-ce-core.S    |  3 +++
 arch/arm64/crypto/sha2-ce-core.S    |  3 +++
 4 files changed, 18 insertions(+), 3 deletions(-)

-- 
2.11.0

Reply via email to