On Tue, 6 May 2008, Jeff Squyres wrote:

On May 5, 2008, at 6:27 PM, Steve Wise wrote:

There is a larger question regarding why the remote node is still
polling the hca and not shutting down, but my immediate question is
if it is an acceptable fix to simply disregard this "error" if it
is an iWARP adapter.

If proc B is still polling the hca, it is likely because it simply has
not yet stopped doing it.  I.e., a big problem in MPI implementations
is that not all actions are exactly synchronous.  MPI disconnects are
*effectively* synchronous, but we probably didn't *guarantee*
synchronicity in this case because we didn't need it (perhaps until
now).

Not to mention... The BTL has to be able to handle a shutdown from one proc while still running its progression engine, as that's a normal sequence of events when dynamic processes are involved. Because of that, there wasn't too much care taken to ensure that everyone stopped polling, then everyone did del_procs.

Brian

Reply via email to