In general, I think making the Public interface to OpenRTE not thread
safe is a reasonable thing to do.  However, I have some concern over how
this would work with the event library.  When the project is compiled
with progress threads, the event library runs in its own thread.  More
important to this discussion, all callbacks from the event library are
triggered in the callback thread (not the thread that registered the
event), meaning that it's very likely the GPR could get a callback from
a non-blocking OOB receive in a thread that is other than the main
thread of the application and that it could happen while the main thread
of the application is already in the GPR.

Not sure what the best way to handle this would be, but I don't think
you could do it from the event level without making adjustments that
would prohibit concurrency at the MPI layer, which is obviously
sub-optimal.

Of course, we could modify the code so that non-OMPI applications didn't
start the event progress thread, but that wouldn't solve the MPI-layer
issues.

Brian

On Fri, 2006-08-25 at 14:14 -0600, Ralph Castain wrote:

> There has been ongoing discussion for some time about the thread safety of
> OpenRTE. This note proposes a solution to that problem that has been kicked
> around for the last several months, and that Jeff and I feel makes a certain
> degree of sense.
> 
> Short description
> -------------------------
> We propose to make OpenRTE appear "single-threaded" to outside users. By
> that we do not mean that OpenRTE may not have some internal threads in
> operation. Instead, we mean that thread locking would be the responsibility
> of anyone calling an OpenRTE function - as opposed to built into the OpenRTE
> system itself.
> 
> Explanation
> -------------------------
> Most of the logic inside of OpenRTE is serial in nature and therefore
> resistant to the use of threads. Accordingly, we find ourselves putting
> giant thread locks around virtually every function in the code base. This
> wastes our time, complicates the code (we all keep forgetting to unlock when
> exiting due to errors), and basically eliminates any benefits from threading
> anyway.
> 
> Those few places where threading is possible are actually involved in
> OpenRTE-internal operations. For example, we now use a thread to accept
> out-of-band communication socket connections. These operations, however, are
> transparent to the user level (i.e., any code that calls OpenRTE).
> 
> It seems, therefore, that the simplest solution is to place the
> responsibility for thread locking onto the calling programs. Unthreaded
> programs need do nothing. Programs utilizing threads, however, would need to
> thread lock prior to calling OpenRTE functions.
> 
> Any comments on this idea? If not, or if there is general consensus on this
> approach, then we would gradually remove the current thread locks as code is
> revised - this isn't a high priority issue requiring an immediate scrub of
> the code.




Reply via email to