[ 
https://issues.apache.org/jira/browse/TS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241374#comment-15241374
 ] 

ASF GitHub Bot commented on TS-3714:
------------------------------------

GitHub user shinrich opened a pull request:

    https://github.com/apache/trafficserver/pull/570

    Ts 3612 - Restructuring client session and transaction processing

    This has been running on a 5.3 base in our production in a few machines for 
about a month.  We are starting to roll it out more widely.  I've also run it 
in our production for a few days on a 6.1 base.  It had some performance 
issues, but the base 6.1 had the same performance issues which I've filed bugs 
on.  Also ran a short while with these changes on master.  Again some problems, 
but the same problems were on the base code in our environment as well.
    
    These changes build upon James' abstraction of the ProxyClientSession.  We 
introduce a ProxyClientTransaction.  HttpSM works with the 
ProxyClientTransaction instead of the HttpClientSession.  Http1 and Http2 
provide subclassed instances of the ProxyClientTransaction.  This takes FetchSM 
out of the picture for Http2.  
https://www.dropbox.com/s/j87lph2z66vx05t/ProxyClientSessionRe-architectProposal.pdf?dl=0
 contains the design document.  We did not do the Spdy aspects of the design.
    
    SPDY is unchanged.  It still uses FetchSM and seems to work as well has it 
has been.  There are minimal changes in the Spdy code.
    
    The bulk of the code changes are directly attributed to the 
Session/Transaction separation.  In addition, there are a few other changes.
    * When moving to 6.1 I had memory freeing issues with some of the changes 
for SSL buffering introduced in TS-3714.  Since these changes were addressing 
issues in 5.0 that were fixed a different way in 5.2, I pulled out the extra 
buffering as unnecessary.  
    * I rolled back the changes for TS-4321 which were merged up yesterday.  
Sorry, I should have jumped on that sooner.  Once FetchSM is out of the 
picture, we should discuss the best way to reduce response header buffering.
    * I made some changes in solve InkConInternal deletes (or lack of deletes) 
that occurred due to changes in the event_counts with the fetchSM removal.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shinrich/trafficserver ts-3612

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafficserver/pull/570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #570
    
----
commit f4999c35cf4e877b0f36e31a739b1297f3a54bee
Author: Susan Hinrichs <[email protected]>
Date:   2016-04-13T19:57:39Z

    TS-3612: Restructure client session and transaction processing.

commit 748aa23b1c2fd2c99d125e76ecbe97d70cfbcdcf
Author: shinrich <[email protected]>
Date:   2016-04-14T14:50:39Z

    Clang format

commit 1f0b35f13b128c486b0efa71106f27d893fe82ba
Author: shinrich <[email protected]>
Date:   2016-04-14T15:24:24Z

    Revert "TS-4321: Keep response headers in FetchSM as they are (#551)"
    
    This reverts commit 60d07be8b199cc843c5e220ac0f6ed0545040422.
    
    Conflicts:
    
        proxy/http2/Http2ConnectionState.cc
    
    Taking this out for now to put in TS-3612.  Once FetchSM is out of the way, 
we can talk about the
    best way to optimize this case.

----


> TS seems to stall between ua_begin and ua_first_read on some transactions 
> resulting in high response times.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: TS-3714
>                 URL: https://issues.apache.org/jira/browse/TS-3714
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 5.3.0
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>             Fix For: 6.0.0
>
>
> An example slow log showing very high *ua_first_read*.
> {code}
> ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url: 
> http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client 
> state: 0 serv
> er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224 
> cache_open_rea
> d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224 
> dns_lookup_end: 42.224
> server_connect: 42.224 server_first_read: 42.229 server_read_header_done: 
> 42.229 server_clos
> e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229
> {code}
> Initially, I suspected that it might be caused by browser's connecting early 
> before sending any bytes to TS. However, this seems to be happening too 
> frequently and with unrealistically high delay between *ua_begin* and 
> *ua_first_read*.
> I suspect it's caused due to the code that disables the read temporarily 
> before calling *TXN_START* hook and re-enables it after the API call out. The 
> read disable is done via a 0-byte *do_io_read* on the client vc, but, the 
> problem is that a valid *mbuf* is still passed. Based on what I am seeing, 
> it's possible this results in actually enabling the *read_vio* all the way 
> uptil *ssl_read_from_net* for instance (if there's a race condition and there 
> were bytes already from the client resulting in an epoll read event), which 
> would finally disable the read since, the *ntodo* (nbytes) is 0. However, 
> this may result in the epoll event being lost until a new i/o happens from 
> the client. I'm trying out further experiments to confirm the theory. In most 
> cases, the read buffer already has some bytes by the time the http session 
> and http sm is created, which makes it just work. But, if there's a slight 
> delay in the receiving bytes after making a connection, the epoll mechanism 
> should read it, but, due to the way the temporary read disable is being done, 
> the event may be lost (this is coz, ATS uses the epoll edge triggered mode).
> Some history on this issue - 
> This issue has been a problem for a long time and affects both http and https 
> requests. When this issue was first reported, our router operations team 
> eventually closed it indicating that disabling *accept* threads resolved it 
> ([~yzlai] also reported similar observations and conclusions). It's possible 
> that the race condition window may be slightly reduced by disabling accept 
> threads, but, to the overall performance and through put, accept threads are 
> very important. I now notice that the issue still exists (regardless of 
> whether or not accept threads are enabled/disabled) and am testing further to 
> confirm the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to