I was trying to understand how the debugger interface is supposed to work. And 
if I was confused before, that feeling never disappeared.

There is one thing that I really can't figure out, and I hope that somebody 
(Jeff/Ralph/Rolf based on svn blame) can enlighten me.

MPIR_debug_gate. In the document accepted by the MPI Forum we have the 
following definition:

> MPIR_debug_gate is an integer variable that is set to 1 by the tool to notify 
> the MPI
> processes that the debugger has attached. An MPI process may use this 
> variable as a
> synchronization mechanism to prevent it from running away before the tool has 
> time to
> attach to the process.
> 
> An MPI implementation is not required to use the MPIR_debug_gate variable for 
> synchronization. However, the MPI job control runtime system must prevent the 
> created MPI
> processes from running beyond the return from the applications call to 
> MPI_INIT.

In case it is not clear enough, in the section describing the startup process, 
we can find the following clarification:

> If the symbol MPIR_partial_attach_ok is defined in the starter process, then 
> this
> informs the tool that the initial startup barrier is implemented by the MPI 
> system,
> and it is not necessary to set the MPIR_debug_gate variable in any of MPI 
> processes.
> However, if the symbol MPIR_partial_attach_ok is not defined in the starter 
> process,
> the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI 
> processes
> to release them from the gate, even if the tool user has instructed the tool 
> to not attach
> to all of the MPI processes.

A started process is defined as being our mpirun. In Open MPI 
MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a 
means to synchronize the processes not based on MPIR_debug_gate. Therefore only 
one behavior if acceptable based on the text above: no MPIR_debug_gate=1 should 
be issued by the tool.

However, in the ompi_debuggers.c around line 226, we have an if that switch 
between the two acceptable behavior (MPIR_debug_gate or own mechanism) based on 
the fact that we are a standalone (slurmd or generic) or not. As generic is the 
ess loaded in most of the cases, I can't figure out how this works if the MPIR 
specification document has to be trusted.

  george.


Reply via email to