Inspecting the core files, I found out the exact requests that caused the
segfaults. By repeating them and isolating the components, I've found out
that the problem is related with the GZIP plug-in. If the plugin is active
and the request issues Accept-Encoding that includes gzip, then, whenever
our plugin blocks the request we have the crash. On these situations, the
producers's list on HttpTunnel::tunnel_run is as such:

(gdb) p producers[0]
$2 = {consumer_list = {head = 0x7fffeb365620}, self_consumer = 0x0, *vc =
0x124be60*, vc_handler = (int (HttpSM::*)(HttpSM *, int,
    HttpTunnelProducer *)) 0x60b8ee <HttpSM::tunnel_handler_server(int,
HttpTunnelProducer*)>, read_vio = 0x0, read_buffer = 0x120ba10,
buffer_start = 0x0, *vc_type = HT_HTTP_SERVER*,
  chunked_handler = {static DEFAULT_MAX_CHUNK_SIZE = 4096, action =
ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer =
0x0, dechunked_size = 0, dechunked_reader = 0x0,
    chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes
= 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0,
bytes_left = 0, last_server_event = 0,
    running_sum = 0, num_digits = 0, max_chunk_size = 4096,
max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0},
chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT,
  do_chunking = false, do_dechunking = false, do_chunked_passthru = false,
init_bytes_done = 626, nbytes = 626, ntodo = 0, bytes_read = 0,
handler_state = 0, last_event = 2302,
  num_consumers = 1, alive = false, read_success = true,
flow_control_source = 0x0, name = 0x7b9201 "http server"}
(gdb) p producers[1]
$3 = {consumer_list = {head = 0x7fffeb365690}, self_consumer = 0x0, vc =
0x1, vc_handler = NULL, read_vio = 0x0, read_buffer = 0x120b830,
buffer_start = 0x120b848, vc_type = HT_STATIC,
  chunked_handler = {static DEFAULT_MAX_CHUNK_SIZE = 4096, action =
ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer =
0x0, dechunked_size = 0, dechunked_reader = 0x0,
    chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes
= 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0,
bytes_left = 0, last_server_event = 0,
    running_sum = 0, num_digits = 0, max_chunk_size = 4096,
max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0},
chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT,
  do_chunking = false, do_dechunking = false, do_chunked_passthru = false,
init_bytes_done = 295, nbytes = 295, ntodo = 0, bytes_read = 0,
handler_state = 0, last_event = 0,
  num_consumers = 1, alive = false, read_success = true,
flow_control_source = 0x0, name = 0x7b91b6 "internal msg"}


  If the plugin is disabled or if the Accept-Encoding does not include
gzip, then we have:
  (gdb) p producers[0]
$1 = {consumer_list = {head = 0x7fffef587620}, self_consumer = 0x0, vc =
0x1, vc_handler = NULL, read_vio = 0x0, read_buffer = 0x12078f0,
buffer_start = 0x1207908, vc_type = HT_STATIC,
  chunked_handler = {static DEFAULT_MAX_CHUNK_SIZE = 4096, action =
ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer =
0x0, dechunked_size = 0, dechunked_reader = 0x0,
    chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes
= 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0,
bytes_left = 0, last_server_event = 0,
    running_sum = 0, num_digits = 0, max_chunk_size = 4096,
max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0},
chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT,
  do_chunking = false, do_dechunking = false, do_chunked_passthru = false,
init_bytes_done = 295, nbytes = 295, ntodo = 0, bytes_read = 0,
handler_state = 0, last_event = 0,
  num_consumers = 1, alive = false, read_success = true,
flow_control_source = 0x0, name = 0x7b91b6 "internal msg"}
(gdb) p producers[1]
$2 = {consumer_list = {head = 0x0}, self_consumer = 0x0, *vc = 0x0*,
vc_handler = NULL, read_vio = 0x0, read_buffer = 0x0, buffer_start =
0x0, *vc_type
= HT_HTTP_SERVER*, chunked_handler = {
    static DEFAULT_MAX_CHUNK_SIZE = 4096, action =
ChunkedHandler::ACTION_DOCHUNK, chunked_reader = 0x0, dechunked_buffer =
0x0, dechunked_size = 0, dechunked_reader = 0x0,
    chunked_buffer = 0x0, chunked_size = 0, truncation = false, skip_bytes
= 0, state = ChunkedHandler::CHUNK_READ_CHUNK, cur_chunk_size = 0,
bytes_left = 0, last_server_event = 0,
    running_sum = 0, num_digits = 0, max_chunk_size = 4096,
max_chunk_header = '\000' <repeats 15 times>, max_chunk_header_len = 0},
chunking_action = TCA_PASSTHRU_DECHUNKED_CONTENT,
  do_chunking = false, do_dechunking = false, do_chunked_passthru = false,
init_bytes_done = 0, nbytes = 0, ntodo = 0, bytes_read = 0, handler_state =
0, last_event = 0, num_consumers = 0,
  alive = false, read_success = false, flow_control_source = 0x0, name =
0x0}


This behavior is deterministic, that is, we've configured the dsv
environment with the same configurations that we use on the prod
environment, and re-issued the same request. Whenever gzip plugin is
activated, we have the crash, and it's always due to a non-NULL vc on
HttpTunnel::tunnel_run. Whenever the plugin is disabled or Accept-Encoding
does not include gzip, than it works and the vc is NULL.

Should I open a JIRA issue for the gzip plugin? Do you guys need any
further informations to tackle the problem? If you do, just let me know, as
I said, we now can reproduce the issue deterministically.


Acácio Centeno
Software Engineering
Azion Technologies
Porto Alegre, Brasil +55 51 3012 3005 | +55 51 8118 9947
Miami, USA +1 305 704 8816

Quaisquer informações contidas neste e-mail e anexos podem ser
confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
de utilização deste documento depende de autorização do emissor, sujeito as
penalidades cabíveis.

Any information in this e-mail and attachments may be confidential and
privileged, protected by legal confidentiality. The use of this document
require authorization by the issuer, subject to penalties.


2014-09-17 14:59 GMT-03:00 Brian Geffon <briangef...@gmail.com>:

> I wonder if this is related to:
> https://issues.apache.org/jira/browse/TS-2497
>
> On Wed, Sep 17, 2014 at 6:32 AM, Acácio Centeno <acacio.cent...@azion.com>
> wrote:
>
> > Folks,
> >
> > I have a situation that happens about once a week and causes a segfault
> on
> > ATS (5.0.1). The stack trace always shows Ptr<IOBufferBlock>::operator=
> and
> > the last function called, and digging a bit I found that this situation
> > happens when, in HttpTunnel::tunnel_run I have two producers, the first
> one
> > reading from the cache and a second from the plugin. The cache producer
> has
> > a non-NULL vc, but has a NULL buffer_start, so when
> MIOBuffer::clone_reader
> > is called to clone it, the system dies.
> >
> > I found the URL that led to the last segfault and repetead it thousands
> of
> > times using curl, without any problems, yet, about once a week we have
> this
> > same issue, so I think its a run condition, but have no proof of it.
> >
> > Have you guys ever seen something like this and if not, do you have any
> > idea how could it be that a cache producer would have a NULL
> buffer_start?
> >
> > The full stack trace, as well as more details can be found on
> > http://pastebin.com/avMen1Cy
> >
> > Thanks in advance,
> > Acácio Centeno
> > Software Engineering
> > Azion Technologies
> > Porto Alegre, Brasil +55 51 3012 3005 | +55 51 8118 9947
> > Miami, USA +1 305 704 8816
> >
> > Quaisquer informações contidas neste e-mail e anexos podem ser
> > confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer
> forma
> > de utilização deste documento depende de autorização do emissor, sujeito
> as
> > penalidades cabíveis.
> >
> > Any information in this e-mail and attachments may be confidential and
> > privileged, protected by legal confidentiality. The use of this document
> > require authorization by the issuer, subject to penalties.
> >
>

Reply via email to