Ralph Castain wrote:
Try adding -display-devel-map to your cmd line so you can see what
OMPI thinks the binding and mapping policy is set to - that'll tell
you if the problem is in the mapping or in the daemon binding.
Also, it might help to know something about this node - like how many
sockets, cores/socket.
Okay. I added -display-devel-map, which tells you the socket/core
information you're looking for. Beyond that, I don't know how to read
the output
# PROBLEM 1: no binding (socket variant)
% mpirun -display-devel-map -np 5 --mca rmaps_base_schedule_policy socket --mca
orte_process_binding socket -report-bindings hostname
Map generated by mapping policy: 0400
Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
Num new daemons: 0 New daemon starting vpid INVALID
Num nodes: 1
Data for node: Name: saem9 Launch id: -1 Arch: ffca0200 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: [[11629,0],0] Daemon launched: True
Num slots: 1 Slots in use: 5
Num slots allocated: 1 Max slots: 0
Username on node: NULL
Num procs: 5 Next node_rank: 5
Data for proc: [[11629,1],0]
Pid: 0 Local rank: 0 Node rank: 0
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11629,1],1]
Pid: 0 Local rank: 1 Node rank: 1
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11629,1],2]
Pid: 0 Local rank: 2 Node rank: 2
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11629,1],3]
Pid: 0 Local rank: 3 Node rank: 3
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11629,1],4]
Pid: 0 Local rank: 4 Node rank: 4
State: 0 App_context: 0 Slot list: NULL
saem9
saem9
saem9
saem9
saem9
# PROBLEM 1: no binding (core variant)
% mpirun -display-devel-map -np 5 --mca rmaps_base_schedule_policy core --mca
orte_process_binding core -report-bindings hostname
Map generated by mapping policy: 0400
Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
Num new daemons: 0 New daemon starting vpid INVALID
Num nodes: 1
Data for node: Name: saem9 Launch id: -1 Arch: ffca0200 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: [[11639,0],0] Daemon launched: True
Num slots: 1 Slots in use: 5
Num slots allocated: 1 Max slots: 0
Username on node: NULL
Num procs: 5 Next node_rank: 5
Data for proc: [[11639,1],0]
Pid: 0 Local rank: 0 Node rank: 0
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11639,1],1]
Pid: 0 Local rank: 1 Node rank: 1
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11639,1],2]
Pid: 0 Local rank: 2 Node rank: 2
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11639,1],3]
Pid: 0 Local rank: 3 Node rank: 3
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11639,1],4]
Pid: 0 Local rank: 4 Node rank: 4
State: 0 App_context: 0 Slot list: NULL
saem9
saem9
saem9
saem9
saem9
% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
# check envvars with ompi_info
% ompi_info -a | grep rmaps_base_schedule_policy
MCA rmaps: parameter "rmaps_base_schedule_policy" (current
value: "socket", data source: environment)
% ompi_info -a | grep orte_process_binding
MCA orte: parameter "orte_process_binding" (current value:
"socket", data source: environment)
# PROBLEM 2: binding is not alternating by socket
% mpirun -display-devel-map -np 5 -report-bindings hostname
Map generated by mapping policy: 0404
Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
Num new daemons: 0 New daemon starting vpid INVALID
Num nodes: 1
Data for node: Name: saem9 Launch id: -1 Arch: ffca0200 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: [[11645,0],0] Daemon launched: True
Num slots: 1 Slots in use: 5
Num slots allocated: 1 Max slots: 0
Username on node: NULL
Num procs: 5 Next node_rank: 5
Data for proc: [[11645,1],0]
Pid: 0 Local rank: 0 Node rank: 0
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11645,1],1]
Pid: 0 Local rank: 1 Node rank: 1
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11645,1],2]
Pid: 0 Local rank: 2 Node rank: 2
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11645,1],3]
Pid: 0 Local rank: 3 Node rank: 3
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[11645,1],4]
Pid: 0 Local rank: 4 Node rank: 4
State: 0 App_context: 0 Slot list: NULL
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],0] to
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],1] to
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],2] to
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],3] to
socket 0 cpus 000f
[saem9:01243] [[11645,0],0] odls:default:fork binding child [[11645,1],4] to
socket 1 cpus 00f0
saem9
saem9
saem9
saem9
saem9