Folks, i would like to comment on r31738 :
> There is no reason to cancel the listening thread. It should die > automatically when the file descriptor is closed. i could not agree more > It is sufficient to just wait for the thread to exit with pthread join. unfortunatly, at least in my test environment (an outdated MPSS 2.1) it is *not* :-( this is what i described in #4615 https://svn.open-mpi.org/trac/ompi/ticket/4615 in which i attached scif_hang.c that evidences that (at least in my environment) scif_poll(...) does *not* return after scif_close(...) is closed, and hence the scif pthread never ends. this is likely a bug in MPSS and it might have been fixed in earlier release. Nathan, could you try scif_hang in your environment and report the MPSS version you are running ? bottom line, and once again, in my test environment, pthread_join (...) without pthread_cancel(...) might cause a hang when the btl/scif module is released. assuming the bug is in old MPSS and has been fixed in recent releases, what is the OpenMPI policy ? a) test the MPSS version and call pthread_cancel() or do *not* call pthread_join if buggy MPSS is detected ? b) display an error/warning if a buggy MPSS is detected ? c) do not call pthread_join at all ? /* SIGSEGV might occur with older MPSS, it is in MPI_Finalize() so impact is limited */ d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI problem after all ? e) something else ? Gilles