[ 
https://issues.apache.org/jira/browse/TS-4916?focusedWorklogId=30700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30700
 ]

ASF GitHub Bot logged work on TS-4916:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Oct/16 22:28
            Start Date: 14/Oct/16 22:28
    Worklog Time Spent: 10m 
      Work Description: Github user gtenev commented on a diff in the pull 
request:

    https://github.com/apache/trafficserver/pull/1100#discussion_r83512075
  
    --- Diff: proxy/http2/Http2Stream.cc ---
    @@ -267,10 +267,12 @@ Http2Stream::do_io_close(int /* flags */)
           // Make sure any trailing end of stream frames are sent
           // Ourselve will be removed at send_data_frames or closing 
connection phase
           static_cast<Http2ClientSession 
*>(parent)->connection_state.send_data_frames(this);
    +
    +      // Make sure the stream is deleted at this point since next step is 
self destroy.
    +      this->delete_stream();
    --- End diff --
    
    This is what actually made sure we delete the stream from the `DLL<>` 
before  triggering `destroy()` (before leaving `do_io_close()`).
    
    In the version 6.2.1 we have `send_data_frames()` delete the stream from 
`DLL<>` on `HTTP2_STREAM_STATE_CLOSED`.
    
    Then in a later version we added `HTTP2_STREAM_STATE_HALF_CLOSED_LOCAL`. 
    
    What caused the destroying of `DLL<>` in our case was 
`HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE` (which does not cause deletion in any 
version).
    
    It seemed to me we have been always vulnerable to this problem despite the 
fixes. That is why I thought I would add this “catch-all-delete-stream” line 
here for all the current and future missing states. After this point we are 
going to `destroy()` the stream  regardless of its state anyway.



Issue Time Tracking
-------------------

    Worklog Id:     (was: 30700)
    Time Spent: 4h 50m  (was: 4h 40m)

> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> --------------------------------------------------------------------
>
>                 Key: TS-4916
>                 URL: https://issues.apache.org/jira/browse/TS-4916
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core, HTTP/2
>            Reporter: Gancho Tenev
>            Assignee: Gancho Tenev
>            Priority: Blocker
>             Fix For: 7.1.0
>
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x00002aaaaacf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x00002aaaaacf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=<optimized out>, edata=<optimized out>) at 
> Http2ConnectionState.cc:823
> #3  0x00002aaaaacef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
> event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
> Http2ClientSession.cc:58
> #5  0x00002aaaaacef462 in Http2ClientSession::state_complete_frame_read 
> (this=0x2ae6ba528290, event=<optimized out>, edata=0x2aab7b237f18) at 
> Http2ClientSession.cc:426
> #6  0x00002aaaaacf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #8  0x00002aaaaacef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
> event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
> #10 0x00002aaaaacf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #12 0x00002aaaaae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=<optimized out>) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
> event=event@entry=100) at UnixNetVConnection.cc:153
> #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
> event=event@entry=100) at UnixNetVConnection.cc:1036
> #15 0x00002aaaaae47653 in SSLNetVConnection::net_read_io 
> (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at 
> SSLNetVConnection.cc:595
> #16 0x00002aaaaae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
> event=<optimized out>, e=<optimized out>) at UnixNet.cc:513
> #17 0x00002aaaaae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
> event=5, this=<optimized out>) at I_Continuation.h:153
> #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
> this=0x2aaab2406000) at UnixEThread.cc:148
> #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
> #20 0x00002aaaaae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
> Thread.cc:86
> #21 0x00002aaaad6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
> pthread_create.c:301
> #22 0x00002aaaae8bc93d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> Here is the stream_list trace.
> {code}
> (gdb) thread 51
> [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
> #0  0x00002aaaaacf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> (gdb) trace_list stream_list
> ------- count=0 -------
> id=29
> this=0x2ae673f0c840
> next=0x2aaac05d8900
> prev=(nil)
> ------- count=1 -------
> id=27
> this=0x2aaac05d8900
> next=0x2ae5b6bbec00
> prev=0x2ae673f0c840
> ------- count=2 -------
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> ------- count=3 -------
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . . 
> ------- count=5560 -------
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . .
> {code}
> Currently I am working on finding out why the list in question got into this 
> “impossible” (broken) state and and eventually coming up with a fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to