This is an automated email from the ASF dual-hosted git repository. awong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 04e584c62e52f8d196ddbba93783007d2fc02a01 Author: Will Berkeley <[email protected]> AuthorDate: Thu Mar 14 17:16:33 2019 -0700 Increase timeout in tls_socket-test Very rarely (~3/2000 times in TSAN with 8 stress threads), tls_socket-test will fail with an log like the following: I0314 19:20:54.118880 236 tls_socket-test.cc:109] server: negotiation complete I0314 19:20:54.119151 223 tls_socket-test.cc:109] client: negotiation complete I0314 19:21:04.127199 236 tls_socket-test.cc:165] server echoing 33406976 bytes /data/6/wdberkeley/kudu/src/kudu/security/tls_socket-test.cc:234: Failure Failed Bad status: Network error: BlockingRecv error: failed to read from TLS socket (remote: unknown): Connection reset by peer (error 104) It seems the following is happening: 1. The client and the echo server connect successfully. 2. The client sends its payload of 32MiB (33554432 bytes) in BlockingWrite. 3. The server, while looping in BlockingRecv receiving the payload and through some combination of resource saturation, unfavorable scheduling, and EINTR returns from recv, fails to read the whole payload before timing out. Notice the 10 second delay between the second and third messages (the timeout is 10s) and the number of bytes being echoed of < 32MiB. 4. The server terminates the connection because of the timeout, but this does not result in a failure on its side because the server was stopped by the client. 5. The client fails when it first tries to BlockingRecv from the closed connection, instead of on the second BlockingRecv as the test intends. This seems like a test-only issue- the time out on the server side seems like reasonable behavior. Since it's so rare, tripling the timeout should hopefully make the issue stop or at least make it much, much rarer. With a 10s timeout, 2000 runs on TSAN, and 8 stress threads, I saw 2-4 failures. With a 30s timeout, I see 0. Change-Id: Ibc615ea8f03a74f38b2bd6f3b4c140b3e435d4f3 Reviewed-on: http://gerrit.cloudera.org:8080/12761 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> --- src/kudu/security/tls_socket-test.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/kudu/security/tls_socket-test.cc b/src/kudu/security/tls_socket-test.cc index f609ce3..b88cdf4 100644 --- a/src/kudu/security/tls_socket-test.cc +++ b/src/kudu/security/tls_socket-test.cc @@ -58,7 +58,7 @@ using std::vector; namespace kudu { namespace security { -const MonoDelta kTimeout = MonoDelta::FromSeconds(10); +const MonoDelta kTimeout = MonoDelta::FromSeconds(30); // Size is big enough to not fit into output socket buffer of default size // (controlled by setsockopt() with SO_SNDBUF).
