I didn't say the eventually wouldn't, George. I was trying to indicate that 
they may not be there yet, and our current code has been tested with their 
current releases - not what they eventually might release.

As to who wanted this "standard"...I was there during the discussions about 
whether or not to submit the interface to the forum, and I recall the pressure 
coming from outside the tool vendors. If anything, certain parties twisted 
their arms into it.

Regardless, I only caution about making changes without first checking that 
others believe it is broken...and I don't recall seeing that question raised to 
the community (and time given for a response) prior to the changes being 
committed. :-)


On Nov 7, 2011, at 8:20 PM, George Bosilca wrote:

> They better do conform to what they asked us to "accept". If wasn't that the 
> MPI Forum members were eager to put the tool interface into the standard, we 
> were kind of forced to. By whom … well by the tool vendors to promote a 
> certain homogeneity.
> 
>  george.
> 
> On Nov 7, 2011, at 20:34 , Ralph Castain wrote:
> 
>> I can't speak to what is in ompi_debuggers.c as I believe Jeff wrote most of 
>> that. However, what is there has been tested and works with TotalView and a 
>> couple of other debuggers.
>> 
>> Best guess: from what I've seen, most debuggers don't seem to conform to 
>> what the MPI Forum has "accepted". It doesn't appear that the vendors and 
>> debugger developers pay too much attention to that document, possibly 
>> because it (a) came after the debuggers were developed, and (b) still 
>> doesn't seem to be widely adopted.
>> 
>> I'd suggest being a little careful about making changes without consulting 
>> people who use TV and "stat", at least - those are the ones most recently 
>> tested.
>> 
>> 
>> On Nov 7, 2011, at 5:59 PM, George Bosilca wrote:
>> 
>>> I was trying to understand how the debugger interface is supposed to work. 
>>> And if I was confused before, that feeling never disappeared.
>>> 
>>> There is one thing that I really can't figure out, and I hope that somebody 
>>> (Jeff/Ralph/Rolf based on svn blame) can enlighten me.
>>> 
>>> MPIR_debug_gate. In the document accepted by the MPI Forum we have the 
>>> following definition:
>>> 
>>>> MPIR_debug_gate is an integer variable that is set to 1 by the tool to 
>>>> notify the MPI
>>>> processes that the debugger has attached. An MPI process may use this 
>>>> variable as a
>>>> synchronization mechanism to prevent it from running away before the tool 
>>>> has time to
>>>> attach to the process.
>>>> 
>>>> An MPI implementation is not required to use the MPIR_debug_gate variable 
>>>> for synchronization. However, the MPI job control runtime system must 
>>>> prevent the created MPI
>>>> processes from running beyond the return from the applications call to 
>>>> MPI_INIT.
>>> 
>>> In case it is not clear enough, in the section describing the startup 
>>> process, we can find the following clarification:
>>> 
>>>> If the symbol MPIR_partial_attach_ok is defined in the starter process, 
>>>> then this
>>>> informs the tool that the initial startup barrier is implemented by the 
>>>> MPI system,
>>>> and it is not necessary to set the MPIR_debug_gate variable in any of MPI 
>>>> processes.
>>>> However, if the symbol MPIR_partial_attach_ok is not defined in the starter 
>>>> process,
>>>> the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI 
>>>> processes
>>>> to release them from the gate, even if the tool user has instructed the 
>>>> tool to not attach
>>>> to all of the MPI processes.
>>> 
>>> A started process is defined as being our mpirun. In Open MPI 
>>> MPIR_partial_attach_ok is defined, so the tool will suppose that we provide 
>>> a means to synchronize the processes not based on MPIR_debug_gate. 
>>> Therefore only one behavior if acceptable based on the text above: no 
>>> MPIR_debug_gate=1 should be issued by the tool.
>>> 
>>> However, in the ompi_debuggers.c around line 226, we have an if that switch 
>>> between the two acceptable behavior (MPIR_debug_gate or own mechanism) 
>>> based on the fact that we are a standalone (slurmd or generic) or not. As 
>>> generic is the ess loaded in most of the cases, I can't figure out how this 
>>> works if the MPIR specification document has to be trusted.
>>> 
>>> george.
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to