Is your compute node included in your machine file ?
If yes, what if you invoke mpirun from a compute node not listed in your
machine file ?
It can also be helpful to post your machinefile

Cheers,

Gilles

On Thursday, April 13, 2017, Cyril Bordage <cyril.bord...@inria.fr> wrote:

> When I run this command from the compute node I have also that. But not
> when I run it from a login node (with the same machine file).
>
>
> Cyril.
>
> Le 13/04/2017 à 16:22, r...@open-mpi.org <javascript:;> a écrit :
> > We are asking all these questions because we cannot replicate your
> problem - so we are trying to help you figure out what is different or
> missing from your machine. When I run your cmd line on my system, I get:
> >
> > [rhc002.cluster:55965] MCW rank 24 bound to socket 0[core 0[hwt 0-1]]:
> [BB/../../../../../../../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 25 bound to socket 1[core 12[hwt 0-1]]:
> [../../../../../../../../../../../..][BB/../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 26 bound to socket 0[core 1[hwt 0-1]]:
> [../BB/../../../../../../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 27 bound to socket 1[core 13[hwt 0-1]]:
> [../../../../../../../../../../../..][../BB/../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 28 bound to socket 0[core 2[hwt 0-1]]:
> [../../BB/../../../../../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 29 bound to socket 1[core 14[hwt 0-1]]:
> [../../../../../../../../../../../..][../../BB/../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 30 bound to socket 0[core 3[hwt 0-1]]:
> [../../../BB/../../../../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 31 bound to socket 1[core 15[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../BB/../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 32 bound to socket 0[core 4[hwt 0-1]]:
> [../../../../BB/../../../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 33 bound to socket 1[core 16[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../BB/../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 34 bound to socket 0[core 5[hwt 0-1]]:
> [../../../../../BB/../../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 35 bound to socket 1[core 17[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../BB/../../../../../..]
> > [rhc002.cluster:55965] MCW rank 36 bound to socket 0[core 6[hwt 0-1]]:
> [../../../../../../BB/../../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 37 bound to socket 1[core 18[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../BB/../../../../..]
> > [rhc002.cluster:55965] MCW rank 38 bound to socket 0[core 7[hwt 0-1]]:
> [../../../../../../../BB/../../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 39 bound to socket 1[core 19[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../BB/../../../..]
> > [rhc002.cluster:55965] MCW rank 40 bound to socket 0[core 8[hwt 0-1]]:
> [../../../../../../../../BB/../../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 41 bound to socket 1[core 20[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../BB/../../..]
> > [rhc002.cluster:55965] MCW rank 42 bound to socket 0[core 9[hwt 0-1]]:
> [../../../../../../../../../BB/../..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 43 bound to socket 1[core 21[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../../BB/../..]
> > [rhc002.cluster:55965] MCW rank 44 bound to socket 0[core 10[hwt 0-1]]:
> [../../../../../../../../../../BB/..][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 45 bound to socket 1[core 22[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../../../BB/..]
> > [rhc002.cluster:55965] MCW rank 46 bound to socket 0[core 11[hwt 0-1]]:
> [../../../../../../../../../../../BB][../../../../../../../../../../../..]
> > [rhc002.cluster:55965] MCW rank 47 bound to socket 1[core 23[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../../../../BB]
> > [rhc001:197743] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
> [BB/../../../../../../../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 1 bound to socket 1[core 12[hwt 0-1]]:
> [../../../../../../../../../../../..][BB/../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]:
> [../BB/../../../../../../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 3 bound to socket 1[core 13[hwt 0-1]]:
> [../../../../../../../../../../../..][../BB/../../../../../../../../../..]
> > [rhc001:197743] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]:
> [../../BB/../../../../../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 5 bound to socket 1[core 14[hwt 0-1]]:
> [../../../../../../../../../../../..][../../BB/../../../../../../../../..]
> > [rhc001:197743] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]:
> [../../../BB/../../../../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 7 bound to socket 1[core 15[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../BB/../../../../../../../..]
> > [rhc001:197743] MCW rank 8 bound to socket 0[core 4[hwt 0-1]]:
> [../../../../BB/../../../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 9 bound to socket 1[core 16[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../BB/../../../../../../..]
> > [rhc001:197743] MCW rank 10 bound to socket 0[core 5[hwt 0-1]]:
> [../../../../../BB/../../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 11 bound to socket 1[core 17[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../BB/../../../../../..]
> > [rhc001:197743] MCW rank 12 bound to socket 0[core 6[hwt 0-1]]:
> [../../../../../../BB/../../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 13 bound to socket 1[core 18[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../BB/../../../../..]
> > [rhc001:197743] MCW rank 14 bound to socket 0[core 7[hwt 0-1]]:
> [../../../../../../../BB/../../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 15 bound to socket 1[core 19[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../BB/../../../..]
> > [rhc001:197743] MCW rank 16 bound to socket 0[core 8[hwt 0-1]]:
> [../../../../../../../../BB/../../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 17 bound to socket 1[core 20[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../BB/../../..]
> > [rhc001:197743] MCW rank 18 bound to socket 0[core 9[hwt 0-1]]:
> [../../../../../../../../../BB/../..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 19 bound to socket 1[core 21[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../../BB/../..]
> > [rhc001:197743] MCW rank 20 bound to socket 0[core 10[hwt 0-1]]:
> [../../../../../../../../../../BB/..][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 21 bound to socket 1[core 22[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../../../BB/..]
> > [rhc001:197743] MCW rank 22 bound to socket 0[core 11[hwt 0-1]]:
> [../../../../../../../../../../../BB][../../../../../../../../../../../..]
> > [rhc001:197743] MCW rank 23 bound to socket 1[core 23[hwt 0-1]]:
> [../../../../../../../../../../../..][../../../../../../../../../../../BB]
> >
> > Exactly as expected. You might check that you have libnuma and
> libnuma-devel installed
> >
> >
> >> On Apr 13, 2017, at 6:50 AM, gil...@rist.or.jp <javascript:;> wrote:
> >>
> >> OK thanks,
> >>
> >> we've had some issues in the past when Open MPI assumed that the (login)
> >> node running mpirun has the same topology than the other (compute)
> nodes.
> >> i just wanted to clear this scenario.
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> ----- Original Message -----
> >>> I am using the 6886c12 commit.
> >>> I have no particular option for the configuration.
> >>> I launch my application in the same way as I presented in my firt
> >> email,
> >>> there is the exact line: mpirun -np 48 -machinefile mf -bind-to core
> >>> -report-bindings ./a.out
> >>>
> >>> lstopo does give the same output on both types on nodes. What is the
> >>> purpose of that?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> Cyril.
> >>>
> >>> Le 13/04/2017 à 15:24, gil...@rist.or.jp <javascript:;> a écrit :
> >>>> Also, can you please run
> >>>> lstopo
> >>>> on both your login and compute nodes ?
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Gilles
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> Can you be a bit more specific?
> >>>>>
> >>>>> - What version of Open MPI are you using?
> >>>>> - How did you configure Open MPI?
> >>>>> - How are you launching Open MPI applications?
> >>>>>
> >>>>>
> >>>>>> On Apr 13, 2017, at 9:08 AM, Cyril Bordage <cyril.bord...@inria.fr
> <javascript:;>
> >>>
> >>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> now this bug happens also when I launch my mpirun command from the
> >>>>>> compute node.
> >>>>>>
> >>>>>>
> >>>>>> Cyril.
> >>>>>>
> >>>>>> Le 06/04/2017 à 05:38, r...@open-mpi.org <javascript:;> a écrit :
> >>>>>>> I believe this has been fixed now - please let me know
> >>>>>>>
> >>>>>>>> On Mar 30, 2017, at 1:57 AM, Cyril Bordage <cyril.bordage@inria.
> >> fr
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I am using the git version of MPI with "-bind-to core -report-
> >>>> bindings"
> >>>>>>>> and I get that for all processes:
> >>>>>>>> [miriel010:160662] MCW rank 0 not bound
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> When I use an old version I get:
> >>>>>>>> [miriel010:44921] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> >>>>>>>> [B/././././././././././.][./././././././././././.]
> >>>>>>>>
> >>>>>>>> From git bisect the culprit seems to be: 48fc339
> >>>>>>>>
> >>>>>>>> This bug happends only when I launch my mpirun command from a
> >>>> login node
> >>>>>>>> and not
> >>>>>>>> from a compute node.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Cyril.
> >>>>>>>> _______________________________________________
> >>>>>>>> devel mailing list
> >>>>>>>> devel@lists.open-mpi.org <javascript:;>
> >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> devel@lists.open-mpi.org <javascript:;>
> >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> devel@lists.open-mpi.org <javascript:;>
> >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Jeff Squyres
> >>>>> jsquy...@cisco.com <javascript:;>
> >>>>>
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> devel@lists.open-mpi.org <javascript:;>
> >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel@lists.open-mpi.org <javascript:;>
> >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >>>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel@lists.open-mpi.org <javascript:;>
> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >>>
> >> _______________________________________________
> >> devel mailing list
> >> devel@lists.open-mpi.org <javascript:;>
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org <javascript:;>
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <javascript:;>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to