[ 
https://issues.apache.org/jira/browse/TS-4916?focusedWorklogId=31218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31218
 ]

ASF GitHub Bot logged work on TS-4916:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Oct/16 02:26
            Start Date: 27/Oct/16 02:26
    Worklog Time Spent: 10m 
      Work Description: GitHub user gtenev opened a pull request:

    https://github.com/apache/trafficserver/pull/1157

    TS-4916 Add safety net to avoid H2-infinite-loop deadlock.

    Current Http2ConnectionState implementation uses a memory pool for
    instantiating streams and DLL<> stream_list for storing active streams.
    Destroying a stream before deleting it from stream_list and then creating
    a new one + reusing the same chunk from the memory pool right away always
    leads to destroying the DLL structure (deadlocks, inconsistencies).
    Added a safety net since the consequences are disastrous.
    Until the design/implementation changes it seems less error prone
    to (double) delete before destroying (noop if already deleted).
    
    (cherry picked from commit a6f9337f61c980739e08bbefeb5f6409b7dcdac1)
    
    Conflicts:
        proxy/http2/Http2ConnectionState.cc

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gtenev/trafficserver h2_loop

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafficserver/pull/1157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1157
    
----
commit 8b3d67d8f3bdb7e837c766e57ae75286f1eb01a3
Author: Gancho Tenev <gtte...@gmail.com>
Date:   2016-10-17T22:10:12Z

    TS-4916 Add safety net to avoid H2-infinite-loop deadlock.
    
    Current Http2ConnectionState implementation uses a memory pool for
    instantiating streams and DLL<> stream_list for storing active streams.
    Destroying a stream before deleting it from stream_list and then creating
    a new one + reusing the same chunk from the memory pool right away always
    leads to destroying the DLL structure (deadlocks, inconsistencies).
    Added a safety net since the consequences are disastrous.
    Until the design/implementation changes it seems less error prone
    to (double) delete before destroying (noop if already deleted).
    
    (cherry picked from commit a6f9337f61c980739e08bbefeb5f6409b7dcdac1)
    
    Conflicts:
        proxy/http2/Http2ConnectionState.cc

----


Issue Time Tracking
-------------------

    Worklog Id:     (was: 31218)
    Time Spent: 7.5h  (was: 7h 20m)

> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> --------------------------------------------------------------------
>
>                 Key: TS-4916
>                 URL: https://issues.apache.org/jira/browse/TS-4916
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core, HTTP/2
>            Reporter: Gancho Tenev
>            Assignee: Gancho Tenev
>            Priority: Blocker
>             Fix For: 7.0.0
>
>          Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x00002aaaaacf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x00002aaaaacf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=<optimized out>, edata=<optimized out>) at 
> Http2ConnectionState.cc:823
> #3  0x00002aaaaacef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
> event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
> Http2ClientSession.cc:58
> #5  0x00002aaaaacef462 in Http2ClientSession::state_complete_frame_read 
> (this=0x2ae6ba528290, event=<optimized out>, edata=0x2aab7b237f18) at 
> Http2ClientSession.cc:426
> #6  0x00002aaaaacf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #8  0x00002aaaaacef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
> event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
> #10 0x00002aaaaacf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #12 0x00002aaaaae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=<optimized out>) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
> event=event@entry=100) at UnixNetVConnection.cc:153
> #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
> event=event@entry=100) at UnixNetVConnection.cc:1036
> #15 0x00002aaaaae47653 in SSLNetVConnection::net_read_io 
> (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at 
> SSLNetVConnection.cc:595
> #16 0x00002aaaaae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
> event=<optimized out>, e=<optimized out>) at UnixNet.cc:513
> #17 0x00002aaaaae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
> event=5, this=<optimized out>) at I_Continuation.h:153
> #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
> this=0x2aaab2406000) at UnixEThread.cc:148
> #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
> #20 0x00002aaaaae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
> Thread.cc:86
> #21 0x00002aaaad6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
> pthread_create.c:301
> #22 0x00002aaaae8bc93d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> Here is the stream_list trace.
> {code}
> (gdb) thread 51
> [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
> #0  0x00002aaaaacf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> (gdb) trace_list stream_list
> ------- count=0 -------
> id=29
> this=0x2ae673f0c840
> next=0x2aaac05d8900
> prev=(nil)
> ------- count=1 -------
> id=27
> this=0x2aaac05d8900
> next=0x2ae5b6bbec00
> prev=0x2ae673f0c840
> ------- count=2 -------
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> ------- count=3 -------
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . . 
> ------- count=5560 -------
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . .
> {code}
> Currently I am working on finding out why the list in question got into this 
> “impossible” (broken) state and and eventually coming up with a fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to