Hello Simon, I don't think anybody every benchmarked this, but people have been complaining this problem appearing on large machines at some point. I have a large SGI machine at work, I'll see if I can reproduce this.
One solution is to export the topology to XML once and then have all your MPI process read from XML. Basically, do "lstopo /tmp/foo.xml" and then export HWLOC_XMLFILE=/tmp/foo.xml in the environment before starting your MPI job. If the topology doesn't change (and that's likely the case), the XML file could even be stored by the administrator in a "standard" location (not in /tmp) Brice Le 05/03/2013 20:23, Simon Hammond a écrit : > Hi HWLOC users, > > We are seeing some significant performance problems using HWLOC 1.6.2 > on Intel's MIC products. In one of our configurations we create 56 MPI > ranks, each rank then queries the topology of the MIC card before > creating threads. We are noticing that if we run 56 MPI ranks as > opposed to one the calls to query the topology in HWLOC are very slow, > runtime goes from seconds to minutes (and upwards). > > We guessed that this might be caused by the kernel serializing access > to the /proc filesystem but this is just a hunch. > > Has anyone had this problem and found an easy way to change the > library / calls to HWLOC so that the slow down is not experienced? > Would you describe this as a bug? > > Thanks for your help. > > > -- > Simon Hammond > > 1-(505)-845-7897 / MS-1319 > Scalable Computer Architectures > Sandia National Laboratories, NM > > > > > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users