On Thu, Nov 5, 2009 at 1:15 PM, Jeremy Orlow <jor...@chromium.org> wrote:

> On Mon, Nov 2, 2009 at 9:50 PM, John Abd-El-Malek <j...@chromium.org>wrote:
>
>> Over the last week, I've been making some changes to how threads are used
>> in the browser process, with the goal of simplifying cross-thread access and
>> improving stability.
>>
>> *The problem*
>> We were using a number of patterns that were problematic:
>>
>> 1) using ChromeThread::GetMessageLoop
>>   -this isn't safe, since it could return a valid pointer, but since the
>> caller isn't holding on to a lock anymore, the target MessageLoop could be
>> destructed while it's being used
>>
>> 2) caching of MessageLoop pointers in order to use them later for PostTask
>> and friends
>>   -this was more efficient previously (more on that later) since using
>> ChromeThread::GetMessageLoop involved a lock
>>   -but it spread logic about the order of thread destruction all over the
>> code.  Code that moved from the IO thread to the file thread and back, in
>> order to avoid doing disk access on the IO thread, ended up having to do an
>> extra hop through the UI thread on the way back to the IO thread since the
>> file thread outlives the Io thread.  Of course, most code learnt this the
>> hard way, after doing the straight forward IO->file->IO thread hop and
>> updating the code after seeing reliability or user crashes
>>   -it made the browser shutdown fragile and hence difficult to update
>>
>> 3) file thread hops using RefCountedThreadSafe objects which have
>> non-trivial destructors
>>   -to reduce jank, frequently an object on the UI or IO thread would
>> execute some code on the file thread, then post the result back to the
>> original thread.  We make this easy using RefCountedThreadSafe and
>> NewRunnableMethod so this pattern happened often, but it's not always
>> safe: NewRunnableMethod holds an extra reference on the object to ensure
>> that it doesn't invoke a method on a deleted object.  But it's quite
>> possible that before the file thread's stack unwinds and it releases the
>> extra reference, that the response task on the original thread executed and
>> released its own additional reference.  The file thread is then left with
>> the remaining reference and the object gets destructed there.  While for
>> most objects this is ok, many have non-trivial destructors, with the worst
>> being ones that register with the per-thread NotificationService.  Dangling
>> pointers would be left behind and tracking these crashes from ChromeBot or
>> the crash dumps has wasted several days at least for me.
>>
>> 4) having browser code take different code paths if a thread didn't exist
>>   -this could be either deceptively harmless  (i.e. execute synchronously
>> when it was normally asynchronous), when in fact it makes shutdown slower
>> because disk access is moved to the UI thread
>>   -it could lead to data loss, if tasks are silently not posted because
>> the code assumes this only happens in unit tests, when it could occur on
>> browser shutdown as well
>>
>>
>> *The solution*
>> 1+2: Use ChromeThread::PostTask and friends (i.e. PostDelayedTask,
>> DeleteSoon, ReleaseSoon) which are safe and efficient: no locks are grabbed
>> if the target thread is known to outlive the current thread.  The four
>> static methods have the same signature as the ones from MessageLoop, with
>> the addition of the first parameter to indicate the target thread.
>>
>> ChromeThread::PostTask(ChromeThread::FILE, FROM_HERE, task);
>>
>> 3: If your object must be destructed on a specific thread, use a trait
>> from ChromeThread:
>>
>> class Foo : public base::RefCountedThreadSafe<Foo,
>> ChromeThread::DeleteOnIOThread>
>>
>
> This is really cool and a great idea, but I'm a little concerned about what
> happens when the thread has already been torn down.  It seems that it calls
> DeleteSoon which calls PostNonNestableTask which calls PostTaskHelper.
>  PostTaskHelper deletes the task if the thread is shut down.
>
> This works fine if something is only supposed to touch 2 threads.  But what
> if 2 threads simultaneously delete something which is supposed to be deleted
> on a third thread that's already been shut down?
>

I'm not sure which object that you're referring to?  The PostTaskHelper will
delete the task.  But if you have a DeleteSoon task, deleting the task (i.e.
because the target thread is gone) doesn't delete the object.

As for simultaneously deleting an object, if more than 1 thread are
accessing an object, they should each have a reference to it.  They can't
ensure that releasing their reference will cause deletion, since there could
be other refences.


>  And what if they both call a method on another class that's not thread
> safe?
>

If they use NewRunnableMethod on an object that's not thread safe, then the
result task would only execute on the target thread.  If the task couldn't
execute because the target thread is gone, then the method won't be invoked.


>  I ask because this scenario is going to prevent me from using this for DOM
> Storage.
>
> A possible solution would be to guarantee that, if the thread is dead,
> destruction happens on the UI thread.  At least in this case, I think it
> would work.
>
> Thoughts?
>

DeleteOnIOThread and the other variants were added specifically to enforce
that an object is deleted on a specific thread.  Without this, as an example
objects that use the NotificationService (which is per thread) won't be able
to unregister themselves, which would lead to corruption later since the
service will call out to deleted objects.  So no, we wouldn't want to delete
these objects on the UI thread because the problem would still exist.

>
>
>> 4: I've removed all the special casing and always made the objects in the
>> browser code behave in one way.  If you're writing a unit test and need to
>> use an object that goes to a file thread (where before it would proceed
>> synchronously), you just need:
>>
>> ChromeThread file_thread(ChromeThread::FILE, MessageLoop::current());
>> foo->StartAsyncTaskOnFileThread();
>> MessageLoop::current()->RunAllPending();
>>
>> There are plenty of examples now in the tests.
>>
>>
>> *Gotchas*
>> -PostTask will silently delete a task if the thread doesn't exist.  This
>> is done to avoid having all the code that uses it have special cases if the
>> target thread exists or not, and to make Valgrind happy.  As usual, the task
>> for DeleteSoon/ReleaseSoon doesn't do anything in its destructor, so this
>> won't cause unexpected behavior with them.  But be wary of posted Task
>> objects which have non-trivial destructors or smart pointers as members.
>>  I'm still on the fence about this, since while the latter is theoretical
>> now, it could lead to problems in the future.  I might change it so that the
>> tasks are not deleted when I'm ready for more Valgrind fun.
>>
>> If you absolutely must know if a task was posted or not, you can check the
>> return value of PostTask and friends.  But note that even if the task was
>> posted successfully, there's no guarantee that it will run because the
>> target thread could already have a QuitTask queued up.
>>
>> g_browser->io_thread() and file_thread() are still around (the former for
>> IPC code, and the latter for Linux proxy code which is in net and so can't
>> use ChromeThread).  Don't use them unless absolutely necessary.
>>
>>
>> *The Catch*
>> Ensuring we don't fall back into our old patterns through code you write
>> and review.
>>
>>
>> *More information*
>> http://code.google.com/p/chromium/issues/detail?id=25354
>>
>> >>
>>
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to