On Mon, Jan 4, 2016 at 4:10 PM, Randell Jesup <rje...@jesup.org> wrote:
>>Kyle Huey writes:
>>
>>> (This is a continuation of the discussion in bug 1218297)
>>>
>>> In bug 1155059 we made nsIEventTarget::Dispatch take an
>>> already_AddRefed instead of a raw pointer.  This was done to allow the
>>> dispatcher to transfer its reference to the runnable to the thread the
>>> runnable will run on.  That solves a race condition we've had
>>> historically where the destructor of a runnable can run on either the
>>> dispatching thread or the executing thread, depending on whether the
>>> executing thread can run the event to completion before the
>>> dispatching thread destroys the nsCOMPtr on the stack.
>>
>>IIUC solving the race condition wasn't the goal of the change in
>>API, but something that was done to retain existing workarounds
>>for leaks.
>
> Solving the "who knows what thread this will be released on" was a
> primary goal.  See comment 0, and the discussion here it references.
>
> Independently, bsmedberg wanted to make Dispatch infallible (at least
> normally), thus making the pattern of "avoid leak in the case of
> Dispatch failing" irrelevant (once done, which it hasn't been).

This seems like the best path forward.

> I started the conversion of Dispatch(rawptr) in the tree to
> Dispatch(already_AddRefed<>); xidorn took over finishing it but hasn't
> yet.  We should re-energize that.

>From the perspective of bug 1218297, that change doesn't actually
affect anything.

> There's considerable discussion in the bug of when leaks occur and also
> the assumed behavior of DispatchToMainThread (which is especially
> failure-prone because of how XPCOM dispatch works - even when MainThread
> still exists, that can fail in shutdown).

Ok, cool.  I didn't want to read the entire thing :)

>>> In bug 1218297 we saw a case where dispatch to a thread (the socket
>>> transport service thread in this case) fails because the thread has
>>> already shut down.  In our brave new world, nsThread simply leaks the
>>> runnable.
>
> Yup.  In cases where we anticipate a possible Dispatch failure (which is
> supposed to become impossible, but isn't currently) you can use the
> (still-existing) raw ptr interface and handle Dispatch failure
> explicitly to release the (leaked) reference, if it's safe.  Not all
> cases are safe to release in that case (which is what drove the initial
> bug filing, where it tried to release JS objects on Dispatch failure off
> mainthread).  Leaking is better than crashing/sec-bugs.

No, you can't.  If you can the raw ptr interface today Dispatch will
create its own reference and pass that to the already_AddRefed version
which then leaks it.  There's no way for the caller to handle this
safely.  Again, as karlt points out, Dispatch leaks today even if the
caller does everything correctly.

Leaking is better than crashing/sec-bugs, but that doesn't mean that
we get to introduce leaks to fix crash/sec-bugs, especially when it's
possible to fix the bug through other means.

> If the problem is that when this happens, a leak is reported by infra,
> then we could ping the leak-checker that there was a dispatch failure
> and leaks are expected.  Even better but maybe not possible would be
> annotate the root of the leak and suppress anything tied to it.

The problem is not that a leak is reported, the problem is that we
leak.  Automatically hiding our bugs is not good engineering practice.

>>> It can't release the reference it holds, because that would
>>> reintroduce the race condition we wanted to avoid, and it can't
>>> release the reference on the correct thread so it's already gone away.
>>> But now we have a new source of intermittent leaks.
>>>
>>> Was this anticipated when designing bug 1155059?  I don't think
>>> leaking is acceptable here, so we may need to back that out and return
>>> to living with that race condition.
>>
>>I agree this approach is not going to work out.  Short term, I
>>think the situation could be improved and most of those changes
>>kept by adding a DispatchOrLeak() variant that can be used by the
>>callers that want to leak.  Then we still have the leaks we had
>>prior to 2265e031ab97 but the new ones are resolved.
>>
>>For the remaining (old) leaks I can think of two options:
>>
>>1) Never call Dispatch() when it might fail and assert that it
>>   does not.
>>
>>2) Keep and additional reference to the runnable in the caller if
>>   it doesn't want Dispatch() to release the last reference.
>>
>>With the former, the caller needs to ensure the thread lives long
>>enough for the Dispatch() to succeed, or that clean up happens
>>before the thread is closed.  With the latter, the caller needs to
>>find another way to release when Dispatch() fails.
>>
>>Making Dispatch() infallible would permit only option 1.  Keeping
>>Dispatch() fallible allows the caller to use either option.
>
> There's another (likely impractical) option: wrap all thread-sensitive
> pointers in a ReleaseRefOnThread<> holder which refuses to release
> objects on the wrong thread and RELEASE_ASSERTS, and integrate that with
> Dispatch's failure path to not assert if you try to release it due to a
> Dispatch failure and instead leak and mark it somehow for automation.

If you're going to do this you could just fix the runnables to not
make assumptions about what threads they are released on, like we did
prior to changing the dispatch signature.

> Leaks are annoying, but only because of automation meant to catch
> persistent/accumulating leaks - shutdown-time-only leaks are far less
> concerning, and (waving hands) 99% of Dispatch()-failures that create
> leaks are in shutdown.

We have no data upon which to wave those hands.

Furthermore leaking a single DOM object is usually enough to leak an
entire page and start clogging up the cycle collector, producing
severe performance degradation over time.

> Also - blocking event processing and DispatchToMainthread before the
> final cycle-collect adds to the problem since anything released in the
> final CC pass can't DispatchToMainthread; this has forced a number of
> tricks or bits of code to handle Dispatch failure.

Yes, the timing of the final cycle collection is an issue.  It's not
clear that there's any good solution there, but it is a separate
problem, I think.

> Once upon a time we didn't do very much with other threads, and also
> didn't pass JS references around, etc.  We use threads a lot more and in
> much more complex ways now.
>
> As Nathan said in the bug:
>> Well, the runnables themselves are certainly safe to refcount
>> anywhere, but the contents of the runnables might not be...
>
> Sec bugs and crashes are worse than leaks, and they can occur
> if we have non-testable random off-thread releases, which is what we
> had, and still do for things not switched to already_AddRefed<>.

Fixing bugs that you can fix through other means is not a
justification for breaking behavior that we've had for 15+ years
without auditing the callsites that depend on it.

> I'd be good with flagging/asserting on Dispatch failures *before* shutdown
> begins as a step to making it infallible, since those have much more
> potential to be logic flaws and/or create accumulating leaks.

So who is going to do that, and who is on the hook for driving
infallible dispatch to completion?

- Kyle
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to