There is a bug in assoc_sk_only_mismatch() and assoc_sk_only_mismatch_tx() that creates a race condition which triggers test flakes in later test cases e.g. data_send_bad_key().
The problem is that the client uses the "conn clr" rpc to setup a data connection with psp_responder, but never uses a matching "data close" rpc. This creates a race condition where if the client can queue another data sock request, like in data_send_bad_key(), before the server can accept the old connection from the backlog we end up in a situation where we have two connections in the backlog: one for the closed connection we have received a FIN for, and one for the new PSP connection which is expecting to do key exchange. >From there the server pops the closed connection from the backlog, but the data_send_bad_key() test case in psp.py hangs waiting to perform key exchange. The fix is to properly use _conn_close, which fill force the server to remove the closed connection from the backlog before sending the RPC ack to the client. Signed-off-by: Daniel Zahka <[email protected]> --- The data_send_bad_key() test case has been flaking in automated testing. The root cause is actually some racy connection setup/teardown logic between the client and server in the preceding test cases. I have detailed the exact circumstances for the test failure in the commit. To reproduce the issue deterministically, I inserted a sleep into the psp_responder.c conn clr handler if (cmd("conn clr")) { if (accept_cfg != ACCEPT_CFG_NONE) fprintf(stderr, "WARN: old conn config still set!\n"); accept_cfg = ACCEPT_CFG_CLEAR; send_ack(comm_sock); + sleep(1); } which produces the following error just running two tests: 1..2 ok 1 psp.assoc_sk_only_mismatch # Exception| Traceback (most recent call last): # Exception| File "/data/users/dzahka/psp-flaky-test/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run # Exception| func(*args) # Exception| File "/data/users/dzahka/psp-flaky-test/./tools/testing/selftests/drivers/net/psp.py", line 420, in data_send_bad_key # Exception| tx = _spi_xchg(s, rx) # Exception| File "/data/users/dzahka/psp-flaky-test/./tools/testing/selftests/drivers/net/psp.py", line 65, in _spi_xchg # Exception| tx = s.recv(4 + len(rx['key'])) # Exception| File "/data/users/dzahka/psp-flaky-test/tools/testing/selftests/net/lib/py/ksft.py", line 258, in _ksft_intr # Exception| raise KsftTerminate() # Exception| net.lib.py.ksft.KsftTerminate # Stopping tests due to KsftTerminate. not ok 2 psp.data_send_bad_key # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0 # # Responder logs (-15): # STDERR: # Set PSP enable on device 3 to 0xf # DEBUG: ... # DEBUG: command: conn clr # DEBUG: ... # DEBUG: command: conn psp # WARN: old conn config still set! # DEBUG: new data sock: psp # DEBUG: create PSP connection # DEBUG: ... # DEBUG: data sock closed # DEBUG: ... # WARN: new data sock but no config # DEBUG: ... # DEBUG: data read 20 # DEBUG: ... Traceback (most recent call last): The problem is caused by the conn clr and conn psp RPC handlers running consecutively without the first connection being accepted and closed by the server. The fix is simply to match all conn clr commands with a data close RPC. The forces the trace to be: # Set PSP enable on device 3 to 0xf # DEBUG: ... # DEBUG: command: conn clr # DEBUG: ... # DEBUG: command: data close # DEBUG: new data sock: clear # DEBUG: ... # DEBUG: command: conn psp # DEBUG: ... # DEBUG: new data sock: psp # DEBUG: create PSP connection So the closed connection from the conn clr is removed from the backlog before sending the ack for data close to the client. --- tools/testing/selftests/drivers/net/psp.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/drivers/net/psp.py b/tools/testing/selftests/drivers/net/psp.py index 528a421ecf76..864d9fce1094 100755 --- a/tools/testing/selftests/drivers/net/psp.py +++ b/tools/testing/selftests/drivers/net/psp.py @@ -266,6 +266,7 @@ def assoc_sk_only_mismatch(cfg): the_exception = cm.exception ksft_eq(the_exception.nl_msg.extack['bad-attr'], ".dev-id") ksft_eq(the_exception.nl_msg.error, -errno.EINVAL) + _close_conn(cfg, s) def assoc_sk_only_mismatch_tx(cfg): @@ -283,6 +284,7 @@ def assoc_sk_only_mismatch_tx(cfg): the_exception = cm.exception ksft_eq(the_exception.nl_msg.extack['bad-attr'], ".dev-id") ksft_eq(the_exception.nl_msg.error, -errno.EINVAL) + _close_conn(cfg, s) def assoc_sk_only_unconn(cfg): --- base-commit: a8a6c8cc8796ac573fb3902803da28cfa374787c change-id: 20260126-psp-flaky-test-ea613ea5386c Best regards, -- Daniel Zahka <[email protected]>

