hi guys.
we've been investigating a couple of core dumps in some testing and we
think we found one in mod-include.
here is the email thread.
does anyone have any reason NOT to reset the state when reseting
the bytes parsed ?
It seems to fix this problem, but I'm fearful that it will cause
something else to break...
>
> > - apparently, the brigade only has an EOS, it is not splittable.
> > But here mod_include is trying to do so, hence messed up
> > the brigade and
> > the later usage of *bb caused core.
> > Every time I saw, when mod_include reach that line, it cores soon.
> >
> > - Question is, why a brigade that only has EOS bucket can reach here?
> > If it supposed to be that way, how to split such a brigade
> > properly?
>
> I think it is valid for the brigade to have only EOS in that situation.
> The brigade has several buckets before the EOS when it enters the
> send_parsed_content function, but the buckets before the EOS may have
> been sent on to the next filter by the time it gets to the end.
>
> The strange thing, though, is that if you reach line 2981,it means that
> there was a partial SSI directive at the end of the content. And indeed
> the ctx->head_start_bucket appears to have an SSI directive at the start
> of its data. But two things are wrong: that bucket isn't on the
> brigade,
> and it looks like a complete (not partial) SSI directive.
Yeah, the PARSE_DIRECTIVE state shoule not reach that line for EOS.
Because the EOS is from ap_finalize_request_protocol(), it seems to
be the only bucket. So I checked the includes_filter().
It tried to reuse the filter's previous ctx in the begining,
but it only cleaned bytes_parsed.
if (!f->ctx) {
...create new ctx here...
}
else {
ctx->bytes_parsed = 0;
}
I added a line after the ctx->bytes_parsed = 0 to reset the state:
else {
ctx->bytes_parsed = 0;
/**/ ctx->state = PRE_HEAD;
}
It is running for over half hour now (with all SSI turned back on and
with dw, ad module turned back on), no more core dump for port 8001 !
(Port 80 still has some though, I am contacting Justin Wang to see
if I can do some tests there too.)
So the ctx->state not cleaned when reuse, is it on purpose?
Is this a good fix? (The problem still could be caused by something
before this...)
-- Jin