[jira] [Commented] (TS-3714) TS seems to stall between ua_begin and ua_first_read on some transactions resulting in high response times.

Sudheer Vinukonda (JIRA) Mon, 29 Jun 2015 15:22:16 -0700

    [ 
https://issues.apache.org/jira/browse/TS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606538#comment-14606538
 ]


Sudheer Vinukonda commented on TS-3714:
---------------------------------------

Debug logs showing pending data read using SSL_read() right after SSL handshake 
complete. Without doing an additional SSL_read() at that point, this data would 
not be read until a subsequent i/o, which may likely not happen quickly enough.

{code}
[Jun 29 22:18:02.378] Server {0x2ab980504700} ERROR: ssl read right after 
handshake, read 1218, pending 0 bytes, for vc 0x2abb04a58c10
[Jun 29 22:18:03.442] Server {0x2ab97be71700} ERROR: ssl read right after 
handshake, read 1202, pending 0 bytes, for vc 0x2abaf46401a0
[Jun 29 22:18:09.150] Server {0x2ab980706700} ERROR: ssl read right after 
handshake, read 1699, pending 0 bytes, for vc 0x2abadc412460
[Jun 29 22:18:10.230] Server {0x2ab980a09700} ERROR: ssl read right after 
handshake, read 504, pending 0 bytes, for vc 0x2abadc4284a0
[Jun 29 22:18:13.923] Server {0x2ab97bf72700} ERROR: ssl read right after 
handshake, read 1411, pending 0 bytes, for vc 0x2aba7c0036b0
[Jun 29 22:18:15.739] Server {0x2ab980100700} ERROR: ssl read right after 
handshake, read 1370, pending 0 bytes, for vc 0x2abac43d3a60
[Jun 29 22:18:15.845] Server {0x2ab980a09700} ERROR: ssl read right after 
handshake, read 1841, pending 0 bytes, for vc 0x2abadc426910
[Jun 29 22:18:18.337] Server {0x2ab980a09700} ERROR: ssl read right after 
handshake, read 2838, pending 0 bytes, for vc 0x2abac43e72c0
[Jun 29 22:18:22.181] Server {0x2ab980a09700} ERROR: ssl read right after 
handshake, read 897, pending 0 bytes, for vc 0x2aba94011020
[Jun 29 22:18:23.458] Server {0x2ab97b669700} ERROR: ssl read right after 
handshake, read 1285, pending 0 bytes, for vc 0x2aba7c00d900
[Jun 29 22:18:25.109] Server {0x2ab980504700} ERROR: ssl read right after 
handshake, read 448, pending 0 bytes, for vc 0x2aba54002d80
{code}

> TS seems to stall between ua_begin and ua_first_read on some transactions 
> resulting in high response times.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: TS-3714
>                 URL: https://issues.apache.org/jira/browse/TS-3714
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 5.3.0
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>
> An example slow log showing very high *ua_first_read*.
> {code}
> ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url: 
> http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client 
> state: 0 serv
> er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224 
> cache_open_rea
> d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224 
> dns_lookup_end: 42.224
> server_connect: 42.224 server_first_read: 42.229 server_read_header_done: 
> 42.229 server_clos
> e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229
> {code}
> Initially, I suspected that it might be caused by browser's connecting early 
> before sending any bytes to TS. However, this seems to be happening too 
> frequently and with unrealistically high delay between *ua_begin* and 
> *ua_first_read*.
> I suspect it's caused due to the code that disables the read temporarily 
> before calling *TXN_START* hook and re-enables it after the API call out. The 
> read disable is done via a 0-byte *do_io_read* on the client vc, but, the 
> problem is that a valid *mbuf* is still passed. Based on what I am seeing, 
> it's possible this results in actually enabling the *read_vio* all the way 
> uptil *ssl_read_from_net* for instance (if there's a race condition and there 
> were bytes already from the client resulting in an epoll read event), which 
> would finally disable the read since, the *ntodo* (nbytes) is 0. However, 
> this may result in the epoll event being lost until a new i/o happens from 
> the client. I'm trying out further experiments to confirm the theory. In most 
> cases, the read buffer already has some bytes by the time the http session 
> and http sm is created, which makes it just work. But, if there's a slight 
> delay in the receiving bytes after making a connection, the epoll mechanism 
> should read it, but, due to the way the temporary read disable is being done, 
> the event may be lost (this is coz, ATS uses the epoll edge triggered mode).
> Some history on this issue - 
> This issue has been a problem for a long time and affects both http and https 
> requests. When this issue was first reported, our router operations team 
> eventually closed it indicating that disabling *accept* threads resolved it 
> ([~yzlai] also reported similar observations and conclusions). It's possible 
> that the race condition window may be slightly reduced by disabling accept 
> threads, but, to the overall performance and through put, accept threads are 
> very important. I now notice that the issue still exists (regardless of 
> whether or not accept threads are enabled/disabled) and am testing further to 
> confirm the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3714) TS seems to stall between ua_begin and ua_first_read on some transactions resulting in high response times.

Reply via email to