[
https://issues.apache.org/jira/browse/TS-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682343#comment-14682343
]
ASF GitHub Bot commented on TS-3797:
------------------------------------
GitHub user shinrich opened a pull request:
https://github.com/apache/trafficserver/pull/275
TS-3797: Crashes due to cross-thread race conditions.
We have been testing a version of this in production on a few machines for
the past two weeks. We are expanding the testing to more machines.
This patch is slightly different than what we are running in production.
I've done a bit a clean up (removing asserts and unifying changed methods that
didn't really need to be there). The biggest difference from what we are
running in production is moving the locking logic in
HttpSessionManager::acquire_session to release the global pool lock sooner.
I'm running this version on a single test box in production at this point.
Will roll these changes back into broader production after the initial version
passes muster (plus changes that get identified in this review).
Article describing the design of VC Migration
https://cwiki.apache.org/confluence/display/TS/Threading+Issues+And+NetVC+Migration
Also a heads up that while I'm calling this VC migration, the only thing
that is migrating is the socket and the SSL object. The actual VC data
structures are re-created on the new thread.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shinrich/trafficserver ts-3797
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/trafficserver/pull/275.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #275
----
commit fe7edce412a8d066a1207cefb421824b4a2ac0ce
Author: shinrich <[email protected]>
Date: 2015-08-11T19:20:46Z
TS-3797: Crashes due to cross-thread race conditions.
----
> Crashes due to cross-thread race conditions
> -------------------------------------------
>
> Key: TS-3797
> URL: https://issues.apache.org/jira/browse/TS-3797
> Project: Traffic Server
> Issue Type: Bug
> Affects Versions: 5.3.0
> Reporter: Susan Hinrichs
> Assignee: Susan Hinrichs
> Labels: yahoo
> Fix For: 6.1.0
>
>
> We had seen crashes with the following stack trace occasionally, but
> recently we have found an environment where these crashes happen so
> frequently that running ATS with global session pools is not feasible.
> {code}
> #0 0x00000000004fac6e in Ptr<IOBufferBlock>::operator IOBufferBlock* (
> this=0x10) at ../lib/ts/Ptr.h:300
> #1 0x00000000005109a2 in IOBufferReader::read_avail (this=0x0)
> at ../iocore/eventsystem/P_IOBuffer.h:606
> #2 0x0000000000777b54 in write_to_net_io (nh=0x2acc365358a0,
> vc=0x2acd38024960, thread=0x2acc36532010) at UnixNetVConnection.cc:540
> #3 0x000000000077747a in write_to_net (nh=0x2acc365358a0, vc=0x2acd38024960,
> thread=0x2acc36532010) at UnixNetVConnection.cc:407
> #4 0x0000000000770378 in NetHandler::mainNetEvent (this=0x2acc365358a0,
> event=5, e=0x2244730) at UnixNet.cc:562
> #5 0x0000000000510560 in Continuation::handleEvent (this=0x2acc365358a0,
> event=5, data=0x2244730) at ../iocore/eventsystem/I_Continuation.h:145
> #6 0x0000000000796ffe in EThread::process_event (this=0x2acc36532010,
> e=0x2244730, calling_code=5) at UnixEThread.cc:128
> #7 0x0000000000797508 in EThread::execute (this=0x2acc36532010)
> at UnixEThread.cc:252
> #8 0x00000000007965a9 in spawn_thread_internal (a=0x2115540) at Thread.cc:85
> #9 0x00002acc2edd49d1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00000032750e88fd in clone () from /lib64/libc.so.6
> {code}
> See
> https://cwiki.apache.org/confluence/display/TS/Threading+Issues+And+NetVC+Migration
> for analysis of the crash and a suggested solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)