On Mon, Apr 15, 2013 at 5:42 PM, Jeff King <[email protected]> wrote:
> On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote:
>
>> > +static void check_die_recursion(const char *fmt, va_list ap)
>> > +{
>> > + static int dying;
>> > +
>> > + if (!dying++)
>> > + return;
>> > +
>> > + vreportf("fatal: ", fmt, ap);
>>
>> How do you know it's safe to call vreportf() ?
>
> Because I hand-audited it.
:)
> But I think the more important question you
> are getting at is: how do I know that it will remain safe to call
> vreportf?
Right.
>> If the bug is in the vreportf code path, we will recurse infinitely
>> (at least until the stack is used up). An implementation of vsnprintf
>> exists in compat/snprintf.c for example.
>
> Right. My assumption was that we are primarily interested in protecting
> against the die_routine. Compat functions should never be calling die.
I think the rule we have been enforcing is less strict than that. We
have only said that any compat function in a die handler path should
never call die. But maybe that's what you meant.
> Of course that is assuming nobody violates that rule, which is part of
> the point of the check.
>
>> It's nice to print out the error message here, but I think doing so
>> defeats the purpose of this "dying" check. Better to get the stack
>> trace from a core dump.
>
> Easier said than done, sometimes, if you are debugging somebody else's
> problem from a dump of a terminal session. :)
>
> But I can live with dropping this patch; my primary goal is the bug-fix
> in patch three.
>
> I was also tempted to suggest just dropping the recursion check
> altogether. While it is neat to detect such things, it's a "should never
> happen" bug situation, and an infinite loop of printing out the same
> message is pretty easy to notice. Do you remember what spurred the
> original check? The message in cd163d4 doesn't say.
That's a valid option.
The primary motivation was that Hannes Sixt had to step in and point
out yet again that the high-level memory allocators should not be
called in anything that could be in a die handler code path. I was on
the thread, but I don't remember the topic (ah, Jonathan has stepped
in with the answer). I do remember that I was not the only one who
had forgotten about that rule though.
We didn't actually have someone report that they encountered infinite
recursion, but it seemed easy enough to add a check for it, so...
Unfortunately, I didn't realize that the async functions installed
their own die handler which may not exit the process, allowing die to
be called multiple times.
To implement this check correctly/completely (i.e. detect recursion in
the main thread as well as in any child threads), I think you really
do need to use thread-local storage as you mentioned in 3/3 which
could look something like:
static pthread_key_t dying;
static pthread_once_t dying_once = PTHREAD_ONCE_INIT;
void setup_die_counter(void)
{
pthread_key_create(&dying, NULL);
}
check_die_recursion(void)
{
pthread_once(&dying_once, setup_die_counter);
if (pthread(getspecific(dying)) {
puts("BUG: recursion...");
exit(128);
}
pthread_setspecific(dying, &dying);
}
or maybe the setup could be performed in set_die_routine(), but it
does kinda seem like overkill for a "nicety" like this. So maybe
checking for recursion in just the main thread as this series does is
better than nothing.
-Brandon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html