Ralph Castain wrote:

Try adding -display-devel-map to your cmd line so you can see what OMPI thinks the binding and mapping policy is set to - that'll tell you if the problem is in the mapping or in the daemon binding.

Also, it might help to know something about this node - like how many sockets, cores/socket.

Okay. I added -display-devel-map, which tells you the socket/core information you're looking for. Beyond that, I don't know how to read the output
# PROBLEM 1:  no binding (socket variant)

% mpirun -display-devel-map -np 5 --mca rmaps_base_schedule_policy socket --mca 
orte_process_binding socket -report-bindings hostname

 Map generated by mapping policy: 0400
        Npernode: 0     Oversubscribe allowed: TRUE     CPU Lists: FALSE
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: Name: saem9             Launch id: -1   Arch: ffca0200  State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[11629,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 5
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 5    Next node_rank: 5
        Data for proc: [[11629,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11629,1],1]
                Pid: 0  Local rank: 1   Node rank: 1
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11629,1],2]
                Pid: 0  Local rank: 2   Node rank: 2
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11629,1],3]
                Pid: 0  Local rank: 3   Node rank: 3
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11629,1],4]
                Pid: 0  Local rank: 4   Node rank: 4
                State: 0        App_context: 0  Slot list: NULL
saem9
saem9
saem9
saem9
saem9

# PROBLEM 1:  no binding (core variant)

% mpirun -display-devel-map -np 5 --mca rmaps_base_schedule_policy core --mca 
orte_process_binding core -report-bindings hostname

 Map generated by mapping policy: 0400
        Npernode: 0     Oversubscribe allowed: TRUE     CPU Lists: FALSE
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: Name: saem9             Launch id: -1   Arch: ffca0200  State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[11639,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 5
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 5    Next node_rank: 5
        Data for proc: [[11639,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11639,1],1]
                Pid: 0  Local rank: 1   Node rank: 1
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11639,1],2]
                Pid: 0  Local rank: 2   Node rank: 2
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11639,1],3]
                Pid: 0  Local rank: 3   Node rank: 3
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11639,1],4]
                Pid: 0  Local rank: 4   Node rank: 4
                State: 0        App_context: 0  Slot list: NULL
saem9
saem9
saem9
saem9
saem9

% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket

# check envvars with ompi_info
% ompi_info -a | grep rmaps_base_schedule_policy
               MCA rmaps: parameter "rmaps_base_schedule_policy" (current 
value: "socket", data source: environment)
% ompi_info -a | grep orte_process_binding
                MCA orte: parameter "orte_process_binding" (current value: 
"socket", data source: environment)

# PROBLEM 2: binding is not alternating by socket

% mpirun -display-devel-map -np 5 -report-bindings hostname

 Map generated by mapping policy: 0404
        Npernode: 0     Oversubscribe allowed: TRUE     CPU Lists: FALSE
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: Name: saem9             Launch id: -1   Arch: ffca0200  State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[11645,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 5
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 5    Next node_rank: 5
        Data for proc: [[11645,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11645,1],1]
                Pid: 0  Local rank: 1   Node rank: 1
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11645,1],2]
                Pid: 0  Local rank: 2   Node rank: 2
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11645,1],3]
                Pid: 0  Local rank: 3   Node rank: 3
                State: 0        App_context: 0  Slot list: NULL
        Data for proc: [[11645,1],4]
                Pid: 0  Local rank: 4   Node rank: 4
                State: 0        App_context: 0  Slot list: NULL
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],0] to 
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],1] to 
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],2] to 
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],3] to 
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],4] to 
socket 1 cpus 00f0
saem9
saem9
saem9
saem9
saem9

Reply via email to