On Fri, Feb 24, 2012 at 8:14 PM, Amarin Phaosawasdi <amari...@samegoal.com> wrote: > Hello, > > About a year ago, there was a thread about deadlocking in > bufferevent_openssl in multi-threaded mode > (http://archives.seul.org/libevent/users/Jan-2011/msg00019.html). > > We've run into the same problem. > > Our minimal fix (as opposed to the patch mentioned, which adds atomic > operations to evthread) for our specific use case was this : > - expose event_base_lock() and event_base_unlock() to library users > - in the user code make sure to lock the event_base before writing to an > openssl bufferevent, so no callbacks can be fired during that write (which > results in a deadlock). > > Does anybody have feedback on better ways to workaround this?
So, this wants a general fix, not just a workaround. Right now, I am not sure what the general fix is. I started writing up some notes, but I don't think they actually point to an obvious conclusion yet. More insight would be welcome! First off, the bufferevents are a red herring: the problem is the way that Libevent handles attempts to event_del an event whose callback is currently running in a different thread. Right now, Libevent makes the thread that is not running the callback wait until the callback is done running. (This same behavior happens with event_add and event_active on signal events.) The problem shows up when we have other locks, like those used by bufferevents. If the thread calling event_del() holds a lock on some resource that the callback wants, then we wind up in a deadlock where the event_del() thread is waiting for the callback to finish, and the callback is waiting for the lock. So why do this? Why wait at all? When I wrote that code (back in 6b4b77a26), I think the rationale was this: there is lots of code where you want to event_del() an event, free it, and free the data structure that was in the event's callback_arg. When using Libevent in a single-threaded mode, this is completely safe, since either the event is not running, or you are doing all of this from the event's callback and you presumably know not to use the callback_arg after it's freed. But if your program is multithreaded, you run into a problem: the callback might be running at the time, merrily using the callback_arg and event you're about to free. So the thread doing the event_del() might free a callback_arg or an event that was in use. Here are all the fixes I can think of quickly, in brainstorming order: - Remove the wait entirely: make it the responsibility of the multithreaded program to ensure that the event callback is not running before you free the event's callback_arg. But the problem here is, how can you do that? You can event_del() the event to make sure that it is not pending or not active, but if the callback is running, it's running. - Remove the wait, and give multithreaded programs a way to make their cleanup happen _after_ the event is done running (if it's running). One sketch interface would make event_free() detect whether the event is running. If so, event_free() might return without freeing the event. Once the callback was finished, another callback would run to do whatever it was that the deleting thread hoped to do. We'd need a corresponding event_unassign() for events not created with event_new(). - What else? Obviously, anything here that would place a new requirement on multithreaded programs should be written in a way so that currently working programs won't break because of it. yrs, -- Nick *********************************************************************** To unsubscribe, send an e-mail to majord...@freehaven.net with unsubscribe libevent-users in the body.