They better do conform to what they asked us to "accept". If wasn't that the MPI Forum members were eager to put the tool interface into the standard, we were kind of forced to. By whom … well by the tool vendors to promote a certain homogeneity.
george. On Nov 7, 2011, at 20:34 , Ralph Castain wrote: > I can't speak to what is in ompi_debuggers.c as I believe Jeff wrote most of > that. However, what is there has been tested and works with TotalView and a > couple of other debuggers. > > Best guess: from what I've seen, most debuggers don't seem to conform to what > the MPI Forum has "accepted". It doesn't appear that the vendors and debugger > developers pay too much attention to that document, possibly because it (a) > came after the debuggers were developed, and (b) still doesn't seem to be > widely adopted. > > I'd suggest being a little careful about making changes without consulting > people who use TV and "stat", at least - those are the ones most recently > tested. > > > On Nov 7, 2011, at 5:59 PM, George Bosilca wrote: > >> I was trying to understand how the debugger interface is supposed to work. >> And if I was confused before, that feeling never disappeared. >> >> There is one thing that I really can't figure out, and I hope that somebody >> (Jeff/Ralph/Rolf based on svn blame) can enlighten me. >> >> MPIR_debug_gate. In the document accepted by the MPI Forum we have the >> following definition: >> >>> MPIR_debug_gate is an integer variable that is set to 1 by the tool to >>> notify the MPI >>> processes that the debugger has attached. An MPI process may use this >>> variable as a >>> synchronization mechanism to prevent it from running away before the tool >>> has time to >>> attach to the process. >>> >>> An MPI implementation is not required to use the MPIR_debug_gate variable >>> for synchronization. However, the MPI job control runtime system must >>> prevent the created MPI >>> processes from running beyond the return from the applications call to >>> MPI_INIT. >> >> In case it is not clear enough, in the section describing the startup >> process, we can find the following clarification: >> >>> If the symbol MPIR_partial_attach_ok is defined in the starter process, then >>> this >>> informs the tool that the initial startup barrier is implemented by the MPI >>> system, >>> and it is not necessary to set the MPIR_debug_gate variable in any of MPI >>> processes. >>> However, if the symbol MPIR_partial_attach_ok is not defined in the starter >>> process, >>> the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI >>> processes >>> to release them from the gate, even if the tool user has instructed the >>> tool to not attach >>> to all of the MPI processes. >> >> A started process is defined as being our mpirun. In Open MPI >> MPIR_partial_attach_ok is defined, so the tool will suppose that we provide >> a means to synchronize the processes not based on MPIR_debug_gate. Therefore >> only one behavior if acceptable based on the text above: no >> MPIR_debug_gate=1 should be issued by the tool. >> >> However, in the ompi_debuggers.c around line 226, we have an if that switch >> between the two acceptable behavior (MPIR_debug_gate or own mechanism) based >> on the fact that we are a standalone (slurmd or generic) or not. As generic >> is the ess loaded in most of the cases, I can't figure out how this works if >> the MPIR specification document has to be trusted. >> >> george. >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel