[
https://issues.apache.org/jira/browse/TS-4717?focusedWorklogId=26209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-26209
]
ASF GitHub Bot logged work on TS-4717:
--------------------------------------
Author: ASF GitHub Bot
Created on: 05/Aug/16 21:28
Start Date: 05/Aug/16 21:28
Worklog Time Spent: 10m
Work Description: GitHub user shinrich opened a pull request:
https://github.com/apache/trafficserver/pull/842
TS-4717: Http2 stack explosion.
Added a common state_process_frame_read method to loop over reading frames
while there is data available. The original state_start_frame_read and
state_complete_frame_read call into state_process_frame_read so the event
handling cases still work.
Have been running a version on of this code on two of our production boxes
for a day. We haven't had a load surge event, so I doubt we have seen a case
that would have caused the stack explosion. But the performance and error
stats seem similar to their peers, so I don't think I have messed up the normal
operating case.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shinrich/trafficserver ts-4717
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/trafficserver/pull/842.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #842
----
commit a166cf0335672abd3514f43a081b4fce045725f2
Author: Susan Hinrichs <[email protected]>
Date: 2016-08-05T14:29:53Z
TS-4717: Http2 stack explosion.
----
Issue Time Tracking
-------------------
Worklog Id: (was: 26209)
Time Spent: 10m
Remaining Estimate: 0h
> Http2 stack explosion
> ---------------------
>
> Key: TS-4717
> URL: https://issues.apache.org/jira/browse/TS-4717
> Project: Traffic Server
> Issue Type: Bug
> Components: HTTP/2
> Reporter: Susan Hinrichs
> Assignee: Susan Hinrichs
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We see this periodically with high traffic loads. ATS crashes with 7000+
> frames on the stack. The bulk of the frames are the following frame
> sequence.
> {code}
> #117 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90,
> event=100, data=0x2b0bad0c7cf0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #118 0x000000000064c05d in Http2ClientSession::state_start_frame_read
> (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0)
> at Http2ClientSession.cc:451
> #119 0x000000000064b0af in Http2ClientSession::main_event_handler
> (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at
> Http2ClientSession.cc:292
> #120 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90,
> event=100, data=0x2b0bad0c7cf0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #121 0x000000000064c386 in Http2ClientSession::state_complete_frame_read
> (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0)
> at Http2ClientSession.cc:483
> #122 0x000000000064b0af in Http2ClientSession::main_event_handler
> (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at
> Http2ClientSession.cc:292
> #123 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90,
> event=100, data=0x2b0bad0c7cf0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #124 0x000000000064c05d in Http2ClientSession::state_start_frame_read
> (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0)
> at Http2ClientSession.cc:451
> {code}
> We had cherry picked in the fix for TS-4209 to correctly enforce the
> concurrent stream limit. But in the latest crash of this type, it looks like
> we are pulling small items from cache, so the stream lives and dies on the
> stack. The concurrent active connection count never reaches the limit.
> I am going to try to change the
> state_state_start_frame_read/state_complete_frame_read logic from recursing
> handlers to a loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)