I heard multiple references to pthread_cancel being known to have bad
side effects. Can somebody educate my on this topic please?

  Thanks,
    George.



On Tue, May 13, 2014 at 10:25 PM, Ralph Castain <r...@open-mpi.org> wrote:
> It could be a bug in the software stack, though I wouldn't count on it. 
> Unfortunately, pthread_cancel is known to have bad side effects, and so we 
> avoid its use.
>
> The key here is that the thread must detect that the file descriptor has 
> closed and exit, or use some other method for detecting that it should 
> terminate. We do this in multiple other places in the code, without using 
> pthread_cancel and without hanging. So it is certainly doable.
>
> I don't know the specifics of why Nathan's code is having trouble exiting, 
> but I suspect that a simple solution - not involving pthread_cancel - can be 
> readily developed.
>
>
> On May 13, 2014, at 7:18 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>>
>> i would like to comment on r31738 :
>>
>>> There is no reason to cancel the listening thread. It should die
>>> automatically when the file descriptor is closed.
>> i could not agree more
>>> It is sufficient to just wait for the thread to exit with pthread join.
>> unfortunatly, at least in my test environment (an outdated MPSS 2.1) it
>> is *not* :-(
>>
>> this is what i described in #4615
>> https://svn.open-mpi.org/trac/ompi/ticket/4615
>> in which i attached scif_hang.c that evidences that (at least in my
>> environment)
>> scif_poll(...) does *not* return after scif_close(...) is closed, and
>> hence the scif pthread never ends.
>>
>> this is likely a bug in MPSS and it might have been fixed in earlier
>> release.
>>
>> Nathan, could you try scif_hang in your environment and report the MPSS
>> version you are running ?
>>
>>
>> bottom line, and once again, in my test environment, pthread_join (...)
>> without pthread_cancel(...)
>> might cause a hang when the btl/scif module is released.
>>
>>
>> assuming the bug is in old MPSS and has been fixed in recent releases,
>> what is the OpenMPI policy ?
>> a) test the MPSS version and call pthread_cancel() or do *not* call
>> pthread_join if buggy MPSS is detected ?
>> b) display an error/warning if a buggy MPSS is detected ?
>> c) do not call pthread_join at all ? /* SIGSEGV might occur with older
>> MPSS, it is in MPI_Finalize() so impact is limited */
>> d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI
>> problem after all ?
>> e) something else ?
>>
>> Gilles
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14786.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14787.php

Reply via email to