Inspecting the core files, I found out the exact requests that caused the segfaults. By repeating them and isolating the components, I've found out that the problem is related with the GZIP plug-in. If the plugin is active and the request issues Accept-Encoding that includes gzip, then, whenever our plugin blocks the request we have the crash. On these situations, the producers's list on HttpTunnel::tunnel_run is as such:
(gdb) p producers[0] $2 = {consumer_list = {head = 0x7fffeb365620}, self_consumer = 0x0, *vc = 0x124be60*, vc_handler = (int (HttpSM::*)(HttpSM *, int, HttpTunnelProducer *)) 0x60b8ee <HttpSM::tunnel_handler_server(int, HttpTunnelProducer*)>, read_vio = 0x0, read_buffer = 0x120ba10, buffer_start = 0x0, *vc_type = HT_HTTP_SERVER*, chunked_handler = {static DEFAULT_MAX_CHUNK_SIZE = 4096, action = ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer = 0x0, dechunked_size = 0, dechunked_reader = 0x0, chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes = 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0, bytes_left = 0, last_server_event = 0, running_sum = 0, num_digits = 0, max_chunk_size = 4096, max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0}, chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT, do_chunking = false, do_dechunking = false, do_chunked_passthru = false, init_bytes_done = 626, nbytes = 626, ntodo = 0, bytes_read = 0, handler_state = 0, last_event = 2302, num_consumers = 1, alive = false, read_success = true, flow_control_source = 0x0, name = 0x7b9201 "http server"} (gdb) p producers[1] $3 = {consumer_list = {head = 0x7fffeb365690}, self_consumer = 0x0, vc = 0x1, vc_handler = NULL, read_vio = 0x0, read_buffer = 0x120b830, buffer_start = 0x120b848, vc_type = HT_STATIC, chunked_handler = {static DEFAULT_MAX_CHUNK_SIZE = 4096, action = ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer = 0x0, dechunked_size = 0, dechunked_reader = 0x0, chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes = 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0, bytes_left = 0, last_server_event = 0, running_sum = 0, num_digits = 0, max_chunk_size = 4096, max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0}, chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT, do_chunking = false, do_dechunking = false, do_chunked_passthru = false, init_bytes_done = 295, nbytes = 295, ntodo = 0, bytes_read = 0, handler_state = 0, last_event = 0, num_consumers = 1, alive = false, read_success = true, flow_control_source = 0x0, name = 0x7b91b6 "internal msg"} If the plugin is disabled or if the Accept-Encoding does not include gzip, then we have: (gdb) p producers[0] $1 = {consumer_list = {head = 0x7fffef587620}, self_consumer = 0x0, vc = 0x1, vc_handler = NULL, read_vio = 0x0, read_buffer = 0x12078f0, buffer_start = 0x1207908, vc_type = HT_STATIC, chunked_handler = {static DEFAULT_MAX_CHUNK_SIZE = 4096, action = ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer = 0x0, dechunked_size = 0, dechunked_reader = 0x0, chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes = 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0, bytes_left = 0, last_server_event = 0, running_sum = 0, num_digits = 0, max_chunk_size = 4096, max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0}, chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT, do_chunking = false, do_dechunking = false, do_chunked_passthru = false, init_bytes_done = 295, nbytes = 295, ntodo = 0, bytes_read = 0, handler_state = 0, last_event = 0, num_consumers = 1, alive = false, read_success = true, flow_control_source = 0x0, name = 0x7b91b6 "internal msg"} (gdb) p producers[1] $2 = {consumer_list = {head = 0x0}, self_consumer = 0x0, *vc = 0x0*, vc_handler = NULL, read_vio = 0x0, read_buffer = 0x0, buffer_start = 0x0, *vc_type = HT_HTTP_SERVER*, chunked_handler = { static DEFAULT_MAX_CHUNK_SIZE = 4096, action = ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer = 0x0, dechunked_size = 0, dechunked_reader = 0x0, chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes = 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0, bytes_left = 0, last_server_event = 0, running_sum = 0, num_digits = 0, max_chunk_size = 4096, max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0}, chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT, do_chunking = false, do_dechunking = false, do_chunked_passthru = false, init_bytes_done = 0, nbytes = 0, ntodo = 0, bytes_read = 0, handler_state = 0, last_event = 0, num_consumers = 0, alive = false, read_success = false, flow_control_source = 0x0, name = 0x0} This behavior is deterministic, that is, we've configured the dsv environment with the same configurations that we use on the prod environment, and re-issued the same request. Whenever gzip plugin is activated, we have the crash, and it's always due to a non-NULL vc on HttpTunnel::tunnel_run. Whenever the plugin is disabled or Accept-Encoding does not include gzip, than it works and the vc is NULL. Should I open a JIRA issue for the gzip plugin? Do you guys need any further informations to tackle the problem? If you do, just let me know, as I said, we now can reproduce the issue deterministically. Acácio Centeno Software Engineering Azion Technologies Porto Alegre, Brasil +55 51 3012 3005 | +55 51 8118 9947 Miami, USA +1 305 704 8816 Quaisquer informações contidas neste e-mail e anexos podem ser confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma de utilização deste documento depende de autorização do emissor, sujeito as penalidades cabíveis. Any information in this e-mail and attachments may be confidential and privileged, protected by legal confidentiality. The use of this document require authorization by the issuer, subject to penalties. 2014-09-17 14:59 GMT-03:00 Brian Geffon <briangef...@gmail.com>: > I wonder if this is related to: > https://issues.apache.org/jira/browse/TS-2497 > > On Wed, Sep 17, 2014 at 6:32 AM, Acácio Centeno <acacio.cent...@azion.com> > wrote: > > > Folks, > > > > I have a situation that happens about once a week and causes a segfault > on > > ATS (5.0.1). The stack trace always shows Ptr<IOBufferBlock>::operator= > and > > the last function called, and digging a bit I found that this situation > > happens when, in HttpTunnel::tunnel_run I have two producers, the first > one > > reading from the cache and a second from the plugin. The cache producer > has > > a non-NULL vc, but has a NULL buffer_start, so when > MIOBuffer::clone_reader > > is called to clone it, the system dies. > > > > I found the URL that led to the last segfault and repetead it thousands > of > > times using curl, without any problems, yet, about once a week we have > this > > same issue, so I think its a run condition, but have no proof of it. > > > > Have you guys ever seen something like this and if not, do you have any > > idea how could it be that a cache producer would have a NULL > buffer_start? > > > > The full stack trace, as well as more details can be found on > > http://pastebin.com/avMen1Cy > > > > Thanks in advance, > > Acácio Centeno > > Software Engineering > > Azion Technologies > > Porto Alegre, Brasil +55 51 3012 3005 | +55 51 8118 9947 > > Miami, USA +1 305 704 8816 > > > > Quaisquer informações contidas neste e-mail e anexos podem ser > > confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer > forma > > de utilização deste documento depende de autorização do emissor, sujeito > as > > penalidades cabíveis. > > > > Any information in this e-mail and attachments may be confidential and > > privileged, protected by legal confidentiality. The use of this document > > require authorization by the issuer, subject to penalties. > > >