Couple of suggestions: * detect that this is an older scif lib and just don't build or enable the scif btl
* have a flag that indicates "you should exit", and then tickle the fd so scif_poll exits Ralph On May 14, 2014, at 7:45 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > Looks like this is a scif bug. From the documentation: > > scif_poll() waits for one of a set of endpoints to become ready to perform an > I/O operation; > it is syntactically and semantically very similar to poll() . The SCIF > functions on which > scif_poll() waits are scif_accept(), scif_send(), and scif_recv(). Consult > the SCIF > API reference manuals for details on scif_poll() usage. > > So, if it is indeed similar to poll() it should wake up when the file > descriptor is closed. > > Since that is not the case I will look through the documentation and see > if there is a way other than pthread_cancel. > > -Nathan > > On Wed, May 14, 2014 at 11:18:05AM +0900, Gilles Gouaillardet wrote: >> Folks, >> >> i would like to comment on r31738 : >> >>> There is no reason to cancel the listening thread. It should die >>> automatically when the file descriptor is closed. >> i could not agree more >>> It is sufficient to just wait for the thread to exit with pthread join. >> unfortunatly, at least in my test environment (an outdated MPSS 2.1) it >> is *not* :-( >> >> this is what i described in #4615 >> https://svn.open-mpi.org/trac/ompi/ticket/4615 >> in which i attached scif_hang.c that evidences that (at least in my >> environment) >> scif_poll(...) does *not* return after scif_close(...) is closed, and >> hence the scif pthread never ends. >> >> this is likely a bug in MPSS and it might have been fixed in earlier >> release. >> >> Nathan, could you try scif_hang in your environment and report the MPSS >> version you are running ? >> >> >> bottom line, and once again, in my test environment, pthread_join (...) >> without pthread_cancel(...) >> might cause a hang when the btl/scif module is released. >> >> >> assuming the bug is in old MPSS and has been fixed in recent releases, >> what is the OpenMPI policy ? >> a) test the MPSS version and call pthread_cancel() or do *not* call >> pthread_join if buggy MPSS is detected ? >> b) display an error/warning if a buggy MPSS is detected ? >> c) do not call pthread_join at all ? /* SIGSEGV might occur with older >> MPSS, it is in MPI_Finalize() so impact is limited */ >> d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI >> problem after all ? >> e) something else ? >> >> Gilles >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14786.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14797.php