Hello all

sorry for asking too many 101 questions; hopefully someone won't mind
answering.

It looks like, as of the current release, some BTLs (e.g. openib) are not
thread safe, and the code explicitly bails out if it finds that MIT_Init()
was called with THREAD_MULTIPLE. Then there are some BTLs, such as TCP,
that can handle THREAD_MULTIPLE. Here are the questions:

1. There must be global (shared) variables that the BTL layer is accessing,
which is giving rise to the thread safety. Is there a list of such
variables, the code path in which they are accessed, and/or any
documentation on them (including any past mailing list post)?

2. Browsing through the mailing list (I have been a subscriber to the
*user* list for quite a while), it looks like a lot of people have stumbled
on to the issue that the openib BTL is not thread safe. Given that, I'd
presume, it is the most popular BTL, since infiniband-like fabrics holds a
lion's share of the HPC interconnect market, it must be quite difficult to
make it thread safe. Any comments on the level of work it would take to
make sure a new BTL would be thread safe? Something along the line of a
'do-this' or 'don't-do-that' would be greatly appreciated.

3. It looks like the openib BTL bailing out if called with THREAD_MULTIPLE
has been removed in the master branch (at least from a cursory look.) Does
that mean that the openib BTL is now thread safe, of is it that the check
has simply been moved to another location?

Thanks in advance
Durga

Life is complex. It has real and imaginary parts.

Reply via email to