Also, could you send the output of 'lstopo --of txt', please?

Regards Hartmut
---------------
https://stellar.cct.lsu.edu
https://github.com/STEllAR-GROUP/hpx


> -----Original Message-----
> From: Hartmut Kaiser <[email protected]>
> Sent: Tuesday, May 25, 2021 7:25 AM
> To: '[email protected]' <[email protected]>
> Subject: RE: [hpx-users] Assign HPX localities to NUMA nodes, in order
>
> Kor,
>
> From what I can see, `--hpx:print-bind` does not report NUMA domains, only
> sockets. Why do you think the localities are not mapped correctly to NUMA
> domains (assuming the sequencing of printing the locality information is
> random and does not reflect the sequencing of the NUMA domains)?
>
> We could look into printing the NUMA domain information as well, however.
>
> HTH
> Regards Hartmut
> ---------------
> https://stellar.cct.lsu.edu
> https://github.com/STEllAR-GROUP/hpx
>
>
> > -----Original Message-----
> > From: [email protected] <hpx-users-bounces@stellar-
> > group.org> On Behalf Of Kor de Jong
> > Sent: Tuesday, May 25, 2021 2:40 AM
> > To: [email protected]
> > Subject: Re: [hpx-users] Assign HPX localities to NUMA nodes, in order
> >
> > Hi Mikael and other HPX experts,
> >
> > Thanks for your suggestions! Unfortunately they did not improve things
> > for me. To be clear, the only thing I don't understand is the binding
> > reported by `--hpx:print-bind`. What I do understand is:
> >
> > - The binding of MPI process ranks to numa nodes, reported by mpirun's
> > `-- display-map`. Process rank 0 is bound to numa node 0, process rank
> > 1 is bound to numa node 1, etc. This is exactly how I want things to be.
> >
> > - Relation between HPX localities and MPI ranks, printed from my own
> > code: hpx::get_locality_id() == hpx::util::mpi_environment::rank() ==
> > std::getenv("OMPI_COMM_WORLD_RANK"). This implies that HPX localities
> > are ordered the same way as the MPI processes. Locality 0 should be
> > bound to numa node 0, locality 1 should be bound to numa node 1, etc.
> > This is exactly how I want things to be.
> >
> > The weird thing is that, according to `--hpx:print-bind`, localities
> > are scattered over the numa nodes. Locality 0 always ends up at the
> > first numa node, but the other ones are bound to numa nodes in a
> > seemingly random order. When performing scaling tests over numa nodes,
> > the resulting graphs show artifacts which could be the result of HPX
> > localities not being ordered according to increasing memory latencies.
> >
> > At the moment I can only think of `--hpx:print-bind` being wrong,
> > which is unlikely I guess. But why does it suggest that the localities
> > are scattered over the numa nodes, when all other information suggests
> > that they are ordered according to the numa nodes?
> >
> > Maybe I am just misunderstanding things. To be able to interpret the
> > results of my scaling tests, I would really like to understand what is
> > going on.
> >
> > Thanks in advance for any insights any of you might have for me!
> >
> > Kor
> >
> >
> > On 5/21/21 5:02 PM, Simberg Mikael wrote:
> > > Hi Kor,
> > >
> > >
> > > The nondeterministic nature of your problem is a bit worrying, and I
> > > don't have any insight into that. However, there's an alternative
> > > way to set the bindings as well. Would you mind trying the
> > > --hpx:use-process-mask option to see if you get the expected bindings?
> > > By default HPX tries to reconstruct the bindings based on various
> > > environment variables, but if you pass --hpx:use-process-mask it
> > > will use the process mask that srun/mpi/others typically set, and
> > > only spawn worker threads on cores in the process mask. Note that
> > > the default, even with --hpx:use-process-mask, is still to only
> > > spawn one worker thread per core (not per hyperthread), so if you
> > > want exactly the binding you ask for with mpirun you should also add
> > > --
> > hpx:threads=all.
> > >
> > >
> > > Mikael
> > >
> > > --------------------------------------------------------------------
> > > --
> > > --
> > > *From:* [email protected]
> > > <[email protected]> on behalf of Kor de Jong
> > > <[email protected]>
> > > *Sent:* Friday, May 21, 2021 4:25:29 PM
> > > *To:* [email protected]
> > > *Subject:* {Spam?} [hpx-users] Assign HPX localities to NUMA nodes,
> > > in order Dear HPX-experts,
> > >
> > > I am trying to spawn 8 hpx processes on a cluster node with 8 numa
> > > nodes, containing 6 real cpu cores each. All seems well, but the
> > > output of `--hpx:print-bind` confuses me.
> > >
> > > I am using slurm (sbatch command) and openmpi (mpirun command in
> > > sbatch script). The output of mpirun's `--display-map` makes
> > > complete
> > sense.
> > > All 8 process ranks get assigned to the 6 cores in the 8 numa nodes,
> > > in order. Process rank 0 is on the first numa node, etc.
> > >
> > > The output of `--hpx:print-bind` seems not in sync with this. There
> > > is a correspondence between mpi ranks and hpx locality ids, but the
> > > mapping of hpx localities to cpu cores is different now. For
> > > example, it seems that locality 1 is not on the second numa node (as
> > > per mpirun's `--display-map`), but on the 7-th (as per hpx's
> > > `--print-bind`). Also, the output of `--print-bind` differs per
> > invocation.
> > >
> > > It is important for me that hpx localities are assigned to numa
> > > nodes in order. Localities with similar IDs communicate more with
> > > each other than with other localities.
> > >
> > > I have attached the slurm script and outputs mentioned above. Does
> > > somebody maybe have an idea what is going on and how to fix things?
> > > Does hpx maybe re-assign the ranks upon initialization? If so, can I
> > > influence this to make this ordering similar to the ordering of the
> > > numa nodes?
> > >
> > > BTW, I am pretty sure all this worked fine some time ago, when I was
> > > still using an earlier version of HPX, another version of MPI, and
> > > started HPX processes using srun instead of mpirun.
> > >
> > > Thanks for any info!
> > >
> > > Kor
> > >
> > >
> > > _______________________________________________
> > > hpx-users mailing list
> > > [email protected]
> > > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
> > >
> > _______________________________________________
> > hpx-users mailing list
> > [email protected]
> > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users


_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to