On Oct 12, 2009, at 9:19 AM, Terry Dontje wrote:
Ralph Castain wrote:
I fixed the process schedule issue on the trunk over the weekend
(not moved to 1.3 yet while it "soaked") - the binding issue was
working fine on the trunk.
So there was an issue of "-mca orte_process_binding" not being
interpreted?
I could not replicate the binding problem on the trunk. I haven't
explored it further just yet.
I believe I applied the fix to stop calling register_params twice
to 1.3 already, but I can check.
No I was asking whether that fix might be causing the
orte_process_binding mca param to not be interpreted. But I think
from what you say in the first paragraph I guess I probably was wrong.
I don't see how, but I will look at it later.
--td
On Oct 12, 2009, at 4:36 AM, Terry Dontje wrote:
In regards to the "-mca XXX" option not overriding the file
setting I thought I saw this working for v1.3. However, I just
retested this and I am seeing the same issue of the "-mca" option
not affecting orte_process_binding or rmaps_base_schedule_policy.
This seems to work under the trunk. I wonder if the issue might
be something we did in r22050 where we stopped calling
orte_register_params twice? Not sure exactly why that would have
prevented the mca option setting taking place the first time.
--td
Ralph Castain wrote:
Try adding -display-devel-map to your cmd line so you can see
what OMPI thinks the binding and mapping policy is set to -
that'll tell you if the problem is in the mapping or in the
daemon binding.
Also, it might help to know something about this node - like how
many sockets, cores/socket.
On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:
Here are two problems with openmpi-1.3.4a1r22051
# Here, I try to run the moral equivalent of -bysocket -bind-to-
socket,
# using the MCA parameter form specified on the mpirun command
line.
# No binding results. THIS IS PROBLEM 1.
% mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca
orte_process_binding socket -report-bindings hostname
saem9
saem9
saem9
saem9
saem9
# Same thing with the "core" form.
% mpirun -np 5 --mca rmaps_base_schedule_policy core --mca
orte_process_binding core -report-bindings hostname
saem9
saem9
saem9
saem9
saem9
# Now, I set the MCA parameters as environment variables.
# I then check the spellings and confirm all is set using
ompi_info.
% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
% ompi_info -a | grep rmaps_base_schedule_policy
MCA rmaps: parameter
"rmaps_base_schedule_policy" (current value: "socket", data
source: environment)
% ompi_info -a | grep orte_process_binding
MCA orte: parameter "orte_process_binding" (current
value: "socket", data source: environment)
# So, now I run a simple program.
# I get binding now, but I'm filling up the first socket before
going to the second.
# THIS IS PROBLEM 2.
% mpirun -np 5 -report-bindings hostname
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],0] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],1] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],2] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],3] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],4] to socket 1 cpus 00f0
saem9
saem9
saem9
saem9
saem9
# Adding -bysocket to the command line fixes things.
% mpirun -np 5 -bysocket -report-bindings hostname
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],0] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],1] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],2] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],3] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],4] to socket 0 cpus 000f
saem9
saem9
saem9
saem9
saem9
Bug? Or am I doing something wrong?
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel