Hi,

I hit the following BUG when running the kcapi-enc-test.sh test from
libkcapi [1] on ppc64/ppc64le with recent kernels:
[  891.863680] BUG: sleeping function called from invalid context at
include/crypto/algapi.h:424
[  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
[  891.864739] 1 lock held by kcapi-enc/12347:
[  891.864811]  #0: 00000000f5d42c46 (sk_lock-AF_ALG){+.+.}, at:
skcipher_recvmsg+0x50/0x530
[  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted
4.19.0-0.rc0.git3.1.fc30.ppc64le #1
[  891.865251] Call Trace:
[  891.865340] [c0000003387578c0] [c000000000d67ea4]
dump_stack+0xe8/0x164 (unreliable)
[  891.865511] [c000000338757910] [c000000000172a58] ___might_sleep+0x2f8/0x310
[  891.865679] [c000000338757990] [c0000000006bff74]
blkcipher_walk_done+0x374/0x4a0
[  891.865825] [c0000003387579e0] [d000000007e73e70]
p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
[  891.865993] [c000000338757ad0] [c0000000006c0ee0]
skcipher_encrypt_blkcipher+0x60/0x80
[  891.866128] [c000000338757b10] [c0000000006ec504]
skcipher_recvmsg+0x424/0x530
[  891.866283] [c000000338757bd0] [c000000000b00654] sock_recvmsg+0x74/0xa0
[  891.866403] [c000000338757c10] [c000000000b00f64] ___sys_recvmsg+0xf4/0x2f0
[  891.866515] [c000000338757d90] [c000000000b02bb8] __sys_recvmsg+0x68/0xe0
[  891.866631] [c000000338757e30] [c00000000000bbe4] system_call+0x5c/0x70

This is on 4.19.0-0.rc0.git3.1.fc30.ppc64le kernel from current Fedora
rawhide, but the same happens on the Koji builders (while building
libkcapi and running its tests) which run on 4.17.* kernels. The BUG
starts to trigger more likely as the length of the message goes up
(usually it starts at 65535 bytes, but sometimes even earlier).

Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
holds the socket lock the whole time and yet passes
CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
wrong?

I don't have much knowledge about the atomic context stuff in the
Linux kernel, but the dmesg output seems to imply that holding the
socket lock is what makes the context atomic and is why the cipher
implementation shouldn't be allowed to sleep here. Perhaps
_skcipher_recvmsg() could actually release the lock before invoking
the cipher operation? AFAIK that only needs to access the allocated
data, which shouldn't be accessed by other tasks anyway.

[1] https://github.com/smuellerDD/libkcapi/tree/master/test

Thanks,
Ondrej

Reply via email to