On 01. mai 2018 22:05, Dave Watson wrote:
It is reported that in some cases, write_space may be called in
do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again:
[ 660.468802] ? do_tcp_sendpages+0x8d/0x580
[ 660.468826] ? tls_push_sg+0x74/0x130 [tls]
[ 660.468852] ? tls_push_record+0x24a/0x390 [tls]
[ 660.468880] ? tls_write_space+0x6a/0x80 [tls]
...
tls_push_sg already does a loop over all sending sg's, so ignore
any tls_write_space notifications until we are done sending.
We then have to call the previous write_space to wake up
poll() waiters after we are done with the send loop.
Reported-by: Andre Tomt <an...@tomt.net>
Signed-off-by: Dave Watson <davejwat...@fb.com>
Unfortunately it seems like this patch has a bug, while it fixed the
kernel crashing it is causing some connections in my testbed to stall.
Making sure ctx->in_tcp_sendpages is also cleared before the return ret
within the while(1) loop seems to fix it for me.
diff -Naurp a/net/tls/tls_main.c b/net/tls/tls_main.c
--- a/net/tls/tls_main.c 2018-05-06 02:23:41.638597066 +0200
+++ b/net/tls/tls_main.c 2018-05-06 01:59:14.378568139 +0200
@@ -135,6 +135,7 @@ retry:
offset -= sg->offset;
ctx->partially_sent_offset = offset;
ctx->partially_sent_record = (void *)sg;
+ ctx->in_tcp_sendpages = false;
return ret;
}