[jira] [Work logged] (TS-4796) ATS not closing origin connections on first RST from client

ASF GitHub Bot (JIRA) Tue, 25 Oct 2016 14:49:25 -0700

     [ 
https://issues.apache.org/jira/browse/TS-4796?focusedWorklogId=31063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31063
 ]


ASF GitHub Bot logged work on TS-4796:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Oct/16 21:48
            Start Date: 25/Oct/16 21:48
    Worklog Time Spent: 10m 
      Work Description: Github user jacksontj commented on the issue:

    https://github.com/apache/trafficserver/pull/947
  
    After doing some testing with this patch, I see crashes where write_to_net 
is being called with a null vc lock.
    
    ```
    (gdb) list
    416   }
    417   while ((vc = write_ready_list.dequeue())) {
    418     if (vc->closed)
    419       close_UnixNetVConnection(vc, trigger_event->ethread);
    420     else if ((vc->write.enabled && vc->write.triggered) || 
vc->write.error)
    421       write_to_net(this, vc, trigger_event->ethread);
    422     else if (!vc->write.enabled) {
    423       write_ready_list.remove(vc);
    ```
    
    Seems that vc->write.error forces write_to_net to be called, but nothing is 
checking that the vc is non-null.
    
    Specifically:
    
    ```
    (gdb) p lock
    $1 = {m = {m_ptr = 0x0}, lock_acquired = 157}
    (gdb) list
    382 write_to_net_io(NetHandler *nh, UnixNetVConnection *vc, EThread *thread)
    383 {
    384   NetState *s = &vc->write;
    385   ProxyMutex *mutex = thread->mutex;
    386 
    387   MUTEX_TRY_LOCK_FOR(lock, s->vio.mutex, thread, s->vio._cont);  // <-- 
this line, specifically the s->vio.mutex
    388 
    389   if (!lock.is_locked() || lock.get_mutex() != s->vio.mutex.m_ptr) {
    390     write_reschedule(nh, vc);
    391     return;
    ```


Issue Time Tracking
-------------------

    Worklog Id:     (was: 31063)
    Time Spent: 8h 50m  (was: 8h 40m)

> ATS not closing origin connections on first RST from client
> -----------------------------------------------------------
>
>                 Key: TS-4796
>                 URL: https://issues.apache.org/jira/browse/TS-4796
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>            Reporter: Thomas Jackson
>            Assignee: Thomas Jackson
>             Fix For: 7.1.0
>
>          Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> *TLDR; similar to TS-4720 -- slower to close than it should, instead of never 
> closing*
> As a continuation of TS-4720, while testing that the session is closed when 
> we expect-- I found that it isn't.
> Although we are now closing the sessions, we aren't doing it as quickly as we 
> should. In this client abort case we expect the client to abort, and ATS 
> should initially continue to send bytes to the client-- as we are in the 
> half-open state. After the first set of bytes are sent to the client-- the 
> client will send an RST-- which should signal ATS to stop sending the request 
> (and tear down the origin connection etc.).
> I'm able to reproduce this locally, and the debug output (with some 
> additional comments) looks like below:
> {code}
> < FIN FROM CLIENT >
> [Aug 29 18:25:07.491] Server {0x7effa538a800} DEBUG: <HttpSM.cc:2649 
> (main_handler)> (http) [0] [HttpSM::main_handler, VC_EVENT_EOS]
> [Aug 29 18:25:07.491] Server {0x7effa538a800} DEBUG: <HttpSM.cc:892 
> (state_watch_for_client_abort)> (http) [0] 
> [&HttpSM::state_watch_for_client_abort, VC_EVENT_EOS]
> < RST FROM CLIENT >
> Got an HttpTunnel event 100 
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1173 
> (producer_handler)> (http_tunnel) [0] producer_handler [http server 
> VC_EVENT_READ_READY]
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1108 
> (producer_handler_chunked)> (http_tunnel) [0] producer_handler_chunked [http 
> server VC_EVENT_READ_READY]
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:203 
> (read_size)> (http_chunk) read chunk size of 15 bytes
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:279 
> (read_chunk)> (http_chunk) completed read of chunk of 15 bytes
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1213 
> (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] 
> enable_redirection: [1 0 0] event: 100
> Got an HttpTunnel event 101 
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1373 
> (consumer_handler)> (http_tunnel) [0] consumer_handler [user agent 
> VC_EVENT_WRITE_READY]
> write ready consumer_handler
> {code}
> In this situation the connection doesn't close here at the RST-- but rather 
> on the next set of bytes from the origin to send-- which end up tripping a 
> VC_EVENT_ERROR-- and tearing down the connection.
> When the client sends the first RST epoll returns a WRITE_READY event -- 
> which the HTTPTunnel consumer ignores completely. It seems then that when we 
> recieve the WRITE_READY event we need to determine if we are already in the 
> writing state-- and if so, then we should stop the transaction (since we are 
> already edge-triggered).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work logged] (TS-4796) ATS not closing origin connections on first RST from client

Reply via email to