I was trying to understand how the debugger interface is supposed to work. And if I was confused before, that feeling never disappeared.
There is one thing that I really can't figure out, and I hope that somebody (Jeff/Ralph/Rolf based on svn blame) can enlighten me. MPIR_debug_gate. In the document accepted by the MPI Forum we have the following definition: > MPIR_debug_gate is an integer variable that is set to 1 by the tool to notify > the MPI > processes that the debugger has attached. An MPI process may use this > variable as a > synchronization mechanism to prevent it from running away before the tool has > time to > attach to the process. > > An MPI implementation is not required to use the MPIR_debug_gate variable for > synchronization. However, the MPI job control runtime system must prevent the > created MPI > processes from running beyond the return from the applications call to > MPI_INIT. In case it is not clear enough, in the section describing the startup process, we can find the following clarification: > If the symbol MPIR_partial_attach_ok is defined in the starter process, then > this > informs the tool that the initial startup barrier is implemented by the MPI > system, > and it is not necessary to set the MPIR_debug_gate variable in any of MPI > processes. > However, if the symbol MPIR_partial_attach_ok is not defined in the starter > process, > the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI > processes > to release them from the gate, even if the tool user has instructed the tool > to not attach > to all of the MPI processes. A started process is defined as being our mpirun. In Open MPI MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a means to synchronize the processes not based on MPIR_debug_gate. Therefore only one behavior if acceptable based on the text above: no MPIR_debug_gate=1 should be issued by the tool. However, in the ompi_debuggers.c around line 226, we have an if that switch between the two acceptable behavior (MPIR_debug_gate or own mechanism) based on the fact that we are a standalone (slurmd or generic) or not. As generic is the ess loaded in most of the cases, I can't figure out how this works if the MPIR specification document has to be trusted. george.