This might get interesting. In "portable hardware locality" (hwloc) as originating at the native cpuset, and I see "locality" working at the machine level (machines in my world can have up to 8 CPUs, for example).
But from an ompi world view, the execution graph across myriad machines might dictate a larger, yet still fine grained approach. I haven't had a chance to play with those aspects. Has anyone else? Ken -----Original Message----- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Monday, August 29, 2011 8:21 AM To: Open MPI Developers Subject: Re: [OMPI devel] known limitation or bug in hwloc? Actually, I'll eat those words. I was looking at the wrong place. Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those cases where the bit mask extends over multiple words. On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote: > Actually, if you look closely at the definition of those two values, you'll see that it really doesn't matter which one we loop over. The NUM_BITS value defines the actual total number of bits in the mask. The CPU_MAX is the total number of cpus we can support, which was set to a value such that the two are equal (i.e., it's a power of two that happens to be an integer multiple of 64). > > I believe the original intent was to allow CPU_MAX to be independent of address-alignment questions, so NUM_BITS could technically be greater than CPU_MAX. Even if this happens, though, all that would do is cause the loop to run across more bits than required. > > So it doesn't introduce a limitation at all. In hindsight, we could simplify things by eliminating one of those values and just putting a requirement on the number that it be a multiple of 64 so it aligns with a memory address. > > > On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote: > >> Nadia, >> >> Interesting. I haven't tried pushing this to levels above 8 on a particular >> machine. Do you think that the cpuset / paffinity / hwloc only applies at >> the machine level, at which time you need to employ a graph with carto? >> >> Regards, >> >> Ken >> >> -----Original Message----- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> Behalf Of nadia.derbey >> Sent: Monday, August 29, 2011 5:45 AM >> To: Open MPI Developers >> Subject: [OMPI devel] known limitation or bug in hwloc? >> >> Hi list, >> >> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64. >> >> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is >> the routine that sets the calling process affinity to the mask given as >> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we >> allow the cpus to be potentially numbered up to >> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1). >> >> The problem with module_set() is that is loops over >> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in >> the mask: >> >> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i) >> { >> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) { >> hwloc_bitmap_set(set, i); >> } >> } >> >> Given "mask"'s type, I think module_set() should instead loop over >> OPAL_PAFFINITY_BITMASK_CPU_MAX bits. >> >> Note that module_set() uses a type for its internal mask that is >> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t). >> >> So I'm wondering whether this is a known limitation I've never heard of >> or an actual bug? >> >> Regards, >> Nadia >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11 >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ----- No virus found in this message. Checked by AVG - www.avg.com Version: 10.0.1392 / Virus Database: 1520/3865 - Release Date: 08/29/11