Le 12/04/2011 15:14, Jeff Squyres a écrit : > On Apr 12, 2011, at 8:10 AM, Brice Goglin wrote: > >> I am looking for a good way to specify PCI and OS devices on the >> command-line (for hwloc-calc and hwloc-bind). >> >> The trunk currently supports: >> * os:foobar with for OS device named foobar (eth0, mlx4_0, ...) >> * pci:0000:00:00.0 or pci:00:00.0 for a given PCI device >> * pci:aaaa:bbbb:c for the c-th PCI device with vendor ID aaaa and device >> ID bbbb >> >> The idea is basically to make it easy to bind processes near some >> high-performance devices: >> hwloc-bind os:mlx4_0 <mympibenchmark> >> hwloc-bind pci:nvidia:tesla:0 <mycudabenchmark> >> > Nifty. > > Can you list multiple devices? E.g.: > > hwloc-bind os:mlx4_0 os:mlx4_1 my_mpi_benchmark >
Yes, that works. We're just extended the way we parse a single "location" on the command line. All existing operations on these locations (add, substract, xor, negate) still work. > Also, is there a CLI way to retrieve which numa nodes / OS processors are > near such devices? I can imagine wanting to script up something like: > > - retrieve a mask / list of processors near OS device <foo> > - binding N processes, one per processor, to the processors near that device > Once you have a way to specify some I/O device, you can convert them into whatever hwloc-calc can do. For instance: hwloc-calc os:mlx4_0 --pulist --po gives the comma-separated list of physical indexes of PU near mlx4_0 By the way, for this exact case, we should actually support: hwloc-distribute <N> --restrict $(hwloc-calc os:mlx4_0) I'll look at this. >> Ideally, the os:foobar notation would be enough. But as long as we don't >> have any OS name associated with (proprietary) GPUs, people will have to >> identify GPUs by their PCI ids. >> >> Other ideas that we may want so support: >> * PCI devices by name: something like the 2nd PCI device whose name >> contains "tesla C2070" so that people don't have to dig into lspci >> manually to find out the vendor/device IDs or busids (mostly useful for >> GPUs that have no OS names) >> > I immediately had that question when I read your 2nd example, above (i.e., > where did you get the names from?). Are these names in the lstopo output? > PCI names are only in the verbose output (they are usually very long). OS names are always shown. >> * OS devices by class: something like os:net:2 for the 2nd network >> interface (not sure it's useful) >> > I'm not sure it is -- isn't the ordering of PCI devices non-deterministic > between cold boots? > As long as you don't plug/unplug anything in between, it should be ok, but I can't be strictly sure about this. The ordering won't change, but the OS names may still change because of udev. >> I/O devices will not be supported through the generic hierarchical >> notation "socket:1.core:2..." anyway. So we could make their >> command-line specification totally different from the usual one. >> >> >> It's actually the first time we select objects on something different >> than just a type or a depth and some indexes. So we could introduce a >> new syntax here. For instance: >> <type>[attributename=attributevalue,...]:index >> <type>[attributename=attributevalue,...]:firstindex:lastindex >> <type>[attributename=attributevalue,...]:firstindex:amount >> Not sure it's worth doing this. >> > It might be better to just put out basic functionality in 1.3 and *not* do > advanced syntax like this (i.e., only do basic syntax). And then see what > people ask for. > Then we need to define what "basic syntax" means :) Brice