Kor,

>From what I can see, `--hpx:print-bind` does not report NUMA domains, only
sockets. Why do you think the localities are not mapped correctly to NUMA
domains (assuming the sequencing of printing the locality information is
random and does not reflect the sequencing of the NUMA domains)?

We could look into printing the NUMA domain information as well, however.

HTH
Regards Hartmut
---------------
https://stellar.cct.lsu.edu
https://github.com/STEllAR-GROUP/hpx


> -----Original Message-----
> From: [email protected] <hpx-users-bounces@stellar-
> group.org> On Behalf Of Kor de Jong
> Sent: Tuesday, May 25, 2021 2:40 AM
> To: [email protected]
> Subject: Re: [hpx-users] Assign HPX localities to NUMA nodes, in order
>
> Hi Mikael and other HPX experts,
>
> Thanks for your suggestions! Unfortunately they did not improve things for
> me. To be clear, the only thing I don't understand is the binding reported
> by `--hpx:print-bind`. What I do understand is:
>
> - The binding of MPI process ranks to numa nodes, reported by mpirun's `--
> display-map`. Process rank 0 is bound to numa node 0, process rank 1 is
> bound to numa node 1, etc. This is exactly how I want things to be.
>
> - Relation between HPX localities and MPI ranks, printed from my own
> code: hpx::get_locality_id() == hpx::util::mpi_environment::rank() ==
> std::getenv("OMPI_COMM_WORLD_RANK"). This implies that HPX localities are
> ordered the same way as the MPI processes. Locality 0 should be bound to
> numa node 0, locality 1 should be bound to numa node 1, etc.
> This is exactly how I want things to be.
>
> The weird thing is that, according to `--hpx:print-bind`, localities are
> scattered over the numa nodes. Locality 0 always ends up at the first numa
> node, but the other ones are bound to numa nodes in a seemingly random
> order. When performing scaling tests over numa nodes, the resulting graphs
> show artifacts which could be the result of HPX localities not being
> ordered according to increasing memory latencies.
>
> At the moment I can only think of `--hpx:print-bind` being wrong, which is
> unlikely I guess. But why does it suggest that the localities are
> scattered over the numa nodes, when all other information suggests that
> they are ordered according to the numa nodes?
>
> Maybe I am just misunderstanding things. To be able to interpret the
> results of my scaling tests, I would really like to understand what is
> going on.
>
> Thanks in advance for any insights any of you might have for me!
>
> Kor
>
>
> On 5/21/21 5:02 PM, Simberg Mikael wrote:
> > Hi Kor,
> >
> >
> > The nondeterministic nature of your problem is a bit worrying, and I
> > don't have any insight into that. However, there's an alternative way
> > to set the bindings as well. Would you mind trying the
> > --hpx:use-process-mask option to see if you get the expected bindings?
> > By default HPX tries to reconstruct the bindings based on various
> > environment variables, but if you pass --hpx:use-process-mask it will
> > use the process mask that srun/mpi/others typically set, and only
> > spawn worker threads on cores in the process mask. Note that the
> > default, even with --hpx:use-process-mask, is still to only spawn one
> > worker thread per core (not per hyperthread), so if you want exactly
> > the binding you ask for with mpirun you should also add --
> hpx:threads=all.
> >
> >
> > Mikael
> >
> > ----------------------------------------------------------------------
> > --
> > *From:* [email protected]
> > <[email protected]> on behalf of Kor de Jong
> > <[email protected]>
> > *Sent:* Friday, May 21, 2021 4:25:29 PM
> > *To:* [email protected]
> > *Subject:* {Spam?} [hpx-users] Assign HPX localities to NUMA nodes, in
> > order Dear HPX-experts,
> >
> > I am trying to spawn 8 hpx processes on a cluster node with 8 numa
> > nodes, containing 6 real cpu cores each. All seems well, but the
> > output of `--hpx:print-bind` confuses me.
> >
> > I am using slurm (sbatch command) and openmpi (mpirun command in
> > sbatch script). The output of mpirun's `--display-map` makes complete
> sense.
> > All 8 process ranks get assigned to the 6 cores in the 8 numa nodes,
> > in order. Process rank 0 is on the first numa node, etc.
> >
> > The output of `--hpx:print-bind` seems not in sync with this. There is
> > a correspondence between mpi ranks and hpx locality ids, but the
> > mapping of hpx localities to cpu cores is different now. For example,
> > it seems that locality 1 is not on the second numa node (as per
> > mpirun's `--display-map`), but on the 7-th (as per hpx's
> > `--print-bind`). Also, the output of `--print-bind` differs per
> invocation.
> >
> > It is important for me that hpx localities are assigned to numa nodes
> > in order. Localities with similar IDs communicate more with each other
> > than with other localities.
> >
> > I have attached the slurm script and outputs mentioned above. Does
> > somebody maybe have an idea what is going on and how to fix things?
> > Does hpx maybe re-assign the ranks upon initialization? If so, can I
> > influence this to make this ordering similar to the ordering of the
> > numa nodes?
> >
> > BTW, I am pretty sure all this worked fine some time ago, when I was
> > still using an earlier version of HPX, another version of MPI, and
> > started HPX processes using srun instead of mpirun.
> >
> > Thanks for any info!
> >
> > Kor
> >
> >
> > _______________________________________________
> > hpx-users mailing list
> > [email protected]
> > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
> >
> _______________________________________________
> hpx-users mailing list
> [email protected]
> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users


_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to