devel-boun...@open-mpi.org wrote on 02/09/2012 12:18:20 PM: > De : Jeff Squyres <jsquy...@cisco.com> > A : Open MPI Developers <de...@open-mpi.org> > Date : 02/09/2012 12:18 PM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@open-mpi.org > > Just so that I understand this better -- if a process is bound in a > cpuset, will tools like hwloc's lstopo only show the Linux > processors *in that cpuset*? I.e., does it not have any visibility > of the processors outside of its cpuset?
Yes, looks like. At least this is what is returned by opal_paffinity_base_get_processor_info(). Regards, Nadia > > > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote: > > > Hi, > > > > If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm > > is configured with: > > TaskPlugin=task/affinity > > TaskPluginParam=Cpusets > > > > each rank of that job is in a cpuset that contains a single CPU. > > > > Now, if we use carto on top of this, the following happens in > > get_ib_dev_distance() (in btl/openib/btl_openib_component.c): > > . opal_paffinity_base_get_processor_info() is called to get the > > number of logical processors (we get 1 due to the singleton cpuset) > > . we loop over that # of processors to check whether our process is > > bound to one of them. In our case the loop will be executed only > > once and we will never get the correct binding information. > > . if the process is bound actually get the distance to the device. > > in our case we won't execute that part of the code. > > > > The attached patch is a proposal to fix the issue. > > > > Regards, > > Nadia > > <get_ib_dev_distance.patch>_______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel