[ 
https://issues.apache.org/jira/browse/TS-4444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284614#comment-15284614
 ] 

Susan Hinrichs commented on TS-4444:
------------------------------------

Given that we are seeing the read buffer clear in two different places, I'm 
concerned that this is a race condition. [~calavera]'s patch looks good, but I 
fear that it is only treating a symptom, but addressing the real problem.

I tracked down a similar problem with disappearing buffer readers this winter 
in our internal code base.  It ultimately turned out to be an issue with the 
plugin.  It was deleting the buffer and then deleting the buffer reader.  If 
you were unlucky, the buffer had been reallocated by the time the buffer reader 
was freed causing the buffer reader to be randomly nulled during the second 
transaction.  Maybe something similar is happening here.  I'll take a look at 
the gzip plugin.  [~oknet] are you running any plugins?



> Segfault accessing NULL connection buffer reader
> ------------------------------------------------
>
>                 Key: TS-4444
>                 URL: https://issues.apache.org/jira/browse/TS-4444
>             Project: Traffic Server
>          Issue Type: Bug
>            Reporter: David Calavera
>            Assignee: Susan Hinrichs
>             Fix For: 7.0.0
>
>
> A few days ago, we found a segfault trying to use the master branch in 
> staging. Using ATS as a proxy to another http server, we managed to segfault 
> ATS with simple client requests. 
> This problem is hard to reproduce, we've seen it connecting clients from Mac 
> OS X and Linux, but it doesn't always happen.
> This is the crash:
> {code}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff2f83700 (LWP 20224)]
> HttpTunnel::consumer_handler (this=this@entry=0x7fffe8fa4d28, event=103, 
> c=0x7fffe8fa4dd8) at HttpTunnel.cc:1340
> 1340        c->buffer_reader->mbuf->dealloc_reader(c->buffer_reader);
> (gdb) bt
> #0  HttpTunnel::consumer_handler (this=this@entry=0x7fffe8fa4d28, event=103, 
> c=0x7fffe8fa4dd8) at HttpTunnel.cc:1340
> #1  0x00000000005ea395 in HttpTunnel::main_handler (this=0x7fffe8fa4d28, 
> event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1574
> #2  0x000000000068df8f in handleEvent (data=0x7fffe4218e00, event=<optimized 
> out>, this=<optimized out>) at ../../iocore/eventsystem/I_Continuation.h:153
> #3  CacheVC::calluser (this=0x7fffe4218bb0, event=<optimized out>) at 
> ../../iocore/cache/P_CacheInternal.h:628
> #4  0x0000000000704c95 in CacheVC::openWriteMain (this=0x7fffe4218bb0) at 
> CacheWrite.cc:1350
> #5  0x00000000007627d0 in handleEvent (data=0x7fffe41d4f80, event=1, 
> this=<optimized out>) at I_Continuation.h:153
> #6  EThread::process_event (this=0x7ffff3085010, e=0x7fffe41d4f80, 
> calling_code=1) at UnixEThread.cc:130
> #7  0x000000000076348b in EThread::execute (this=0x7ffff3085010) at 
> UnixEThread.cc:184
> #8  0x000000000076224a in spawn_thread_internal (a=0x10affa0) at Thread.cc:86
> #9  0x00007ffff6890182 in start_thread (arg=0x7ffff2f83700) at 
> pthread_create.c:312
> #10 0x00007ffff5b9947d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {code}
> We were able to reproduce the issue with this simple Go program:
> {code:java}
> package main
> import (
>       "fmt"
>       "html"
>       "log"
>       "net/http"
> )
> func main() {
>       http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
>               w.Header().Set("Content-Type", "text/html; charset=UTF-8")
>               w.Header().Set("Cache-Control", "public, max-age=0, 
> must-revalidate")
>               fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
>       })
>       log.Fatal(http.ListenAndServe(":9393", nil))
> }
> {code}
> We redirect ATS to this program with a very simple remap rule:
> {code}
> regex_map http://.*  http://10.1.10.17:9393
> {code}
> Using `git bisect` we found that the issue was introduced in this commit, 
> although we don't really understand why:
> https://github.com/apache/trafficserver/commit/af76977adb9f3c0296a232688bbcb5a1421a6768
> I have a patch ready that seems to fix the problem and that I'll push as a 
> pull request to GitHub asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to