[ https://issues.apache.org/jira/browse/TS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606538#comment-14606538 ]
Sudheer Vinukonda commented on TS-3714: --------------------------------------- Debug logs showing pending data read using SSL_read() right after SSL handshake complete. Without doing an additional SSL_read() at that point, this data would not be read until a subsequent i/o, which may likely not happen quickly enough. {code} [Jun 29 22:18:02.378] Server {0x2ab980504700} ERROR: ssl read right after handshake, read 1218, pending 0 bytes, for vc 0x2abb04a58c10 [Jun 29 22:18:03.442] Server {0x2ab97be71700} ERROR: ssl read right after handshake, read 1202, pending 0 bytes, for vc 0x2abaf46401a0 [Jun 29 22:18:09.150] Server {0x2ab980706700} ERROR: ssl read right after handshake, read 1699, pending 0 bytes, for vc 0x2abadc412460 [Jun 29 22:18:10.230] Server {0x2ab980a09700} ERROR: ssl read right after handshake, read 504, pending 0 bytes, for vc 0x2abadc4284a0 [Jun 29 22:18:13.923] Server {0x2ab97bf72700} ERROR: ssl read right after handshake, read 1411, pending 0 bytes, for vc 0x2aba7c0036b0 [Jun 29 22:18:15.739] Server {0x2ab980100700} ERROR: ssl read right after handshake, read 1370, pending 0 bytes, for vc 0x2abac43d3a60 [Jun 29 22:18:15.845] Server {0x2ab980a09700} ERROR: ssl read right after handshake, read 1841, pending 0 bytes, for vc 0x2abadc426910 [Jun 29 22:18:18.337] Server {0x2ab980a09700} ERROR: ssl read right after handshake, read 2838, pending 0 bytes, for vc 0x2abac43e72c0 [Jun 29 22:18:22.181] Server {0x2ab980a09700} ERROR: ssl read right after handshake, read 897, pending 0 bytes, for vc 0x2aba94011020 [Jun 29 22:18:23.458] Server {0x2ab97b669700} ERROR: ssl read right after handshake, read 1285, pending 0 bytes, for vc 0x2aba7c00d900 [Jun 29 22:18:25.109] Server {0x2ab980504700} ERROR: ssl read right after handshake, read 448, pending 0 bytes, for vc 0x2aba54002d80 {code} > TS seems to stall between ua_begin and ua_first_read on some transactions > resulting in high response times. > ----------------------------------------------------------------------------------------------------------- > > Key: TS-3714 > URL: https://issues.apache.org/jira/browse/TS-3714 > Project: Traffic Server > Issue Type: Bug > Components: Core > Affects Versions: 5.3.0 > Reporter: Sudheer Vinukonda > Assignee: Sudheer Vinukonda > > An example slow log showing very high *ua_first_read*. > {code} > ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url: > http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client > state: 0 serv > er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224 > cache_open_rea > d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224 > dns_lookup_end: 42.224 > server_connect: 42.224 server_first_read: 42.229 server_read_header_done: > 42.229 server_clos > e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229 > {code} > Initially, I suspected that it might be caused by browser's connecting early > before sending any bytes to TS. However, this seems to be happening too > frequently and with unrealistically high delay between *ua_begin* and > *ua_first_read*. > I suspect it's caused due to the code that disables the read temporarily > before calling *TXN_START* hook and re-enables it after the API call out. The > read disable is done via a 0-byte *do_io_read* on the client vc, but, the > problem is that a valid *mbuf* is still passed. Based on what I am seeing, > it's possible this results in actually enabling the *read_vio* all the way > uptil *ssl_read_from_net* for instance (if there's a race condition and there > were bytes already from the client resulting in an epoll read event), which > would finally disable the read since, the *ntodo* (nbytes) is 0. However, > this may result in the epoll event being lost until a new i/o happens from > the client. I'm trying out further experiments to confirm the theory. In most > cases, the read buffer already has some bytes by the time the http session > and http sm is created, which makes it just work. But, if there's a slight > delay in the receiving bytes after making a connection, the epoll mechanism > should read it, but, due to the way the temporary read disable is being done, > the event may be lost (this is coz, ATS uses the epoll edge triggered mode). > Some history on this issue - > This issue has been a problem for a long time and affects both http and https > requests. When this issue was first reported, our router operations team > eventually closed it indicating that disabling *accept* threads resolved it > ([~yzlai] also reported similar observations and conclusions). It's possible > that the race condition window may be slightly reduced by disabling accept > threads, but, to the overall performance and through put, accept threads are > very important. I now notice that the issue still exists (regardless of > whether or not accept threads are enabled/disabled) and am testing further to > confirm the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)