masaori335 opened a new pull request #6841: URL: https://github.com/apache/trafficserver/pull/6841
It looks like ATS retries "writing" instead of "reading" when it gets reading error on outgoing TLS connection. I found that ATS keep trying to write after sending CLIENT HELLO on TLS Handshake to the origin server in spite of nothing to write. The loop never ends until the handshake is completed. ``` [Jun 2 14:13:49.959] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1412 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:13:49.960] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1451 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_WRITE ... ``` This should be `SSL_ERROR_WANT_READ` and wait for reading until ATS receives SERVER HELLO from the origin server. This becomes below with this change. ``` [Jun 2 14:48:12.284] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1405 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:48:12.284] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1448 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_READ [Jun 2 14:48:12.284] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1405 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:48:12.284] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1448 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_READ [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1405 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLClientUtils.cc:44 (verify_callback)> (ssl-diag) Entered verify cb [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLClientUtils.cc:44 (verify_callback)> (ssl-diag) Entered verify cb [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1448 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_READ [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:608 (net_read_io)> (ssl) ssl handshake for vc 0x7f8731d61b20, took 0.000 seconds, configured handshake_timer: 30 [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1405 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:48:12.286] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1448 (sslClientHandShakeEvent)> (ssl.error) SSL_ERROR_WANT_READ [Jun 2 14:48:12.287] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1405 (sslClientHandShakeEvent)> (ssl) Initialize outbound connect curHook from NULL [Jun 2 14:48:12.287] [ET_NET 0] DEBUG: <SSLNetVConnection.cc:1424 (sslClientHandShakeEvent)> (ssl) SSL client handshake completed successfully ``` Fortunately, we're treating `SSL_ERROR_WANT_WRITE` and `SSL_ERROR_WANT_READ` as almost the same on `ssl_read_from_net()`. Probably, we have issues only on TLS Handshake to the origin server. This bug is introduced by https://github.com/apache/trafficserver/commit/dcbdc684ec7f4cd4e30cca1ced0d891f7cbd1671#diff-cf0fb3a8106cbc3bbcb3734296edd796R102 & released by 7.1.0. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
