Re: [hwloc-users] Using hwloc to map GPU layout on system

Brock Palen Fri, 14 Feb 2014 17:21:17 -0500 (EST)

On Feb 7, 2014, at 9:45 AM, Brice Goglin <brice.gog...@inria.fr> wrote:


> Le 06/02/2014 21:31, Brock Palen a écrit :
>> Actually that did turn out to help. The nvml# devices appear to be numbered 
>> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are 
>> in the order that PBS and nvidia-smi see them.
> 
> By the way, did you have CUDA_VISIBLE_DEVICES set during the lstopo below? 
> Was it set to 2,3,0,1 ? That would explain the reordering.

It was not set, and I have double checked it just now to be sure.

> 
> I am not sure in which order you want to do things in the end. One way that 
> could help is:
> * Get the locality of each GPU by doing CUDA_VISIBLE_DEVICES=x (for x in 
> 0..number of gpus-1). Each iteration gives a single GPU in hwloc, and you can 
> retrieve the corresponding locality from the cuda0 object.
> * Once you know which GPUs you want based on the locality info, take the 
> corresponding #x and put them in CUDA_VISIBLE_DEVICES=x,y before you run your 
> program. hwloc will create cuda0 for x and cuda1 for y.

The cuda ID's match the order if you run nvidia-smi  (which gives you PCI 
addresses)

The nvml id's  match the order in which they start.  That is 
CUDA_VISIBLE_DEVICES=0, cudaSetDevice(0) matches nvml0  which matches id 2 for 
CoProc cuda2 and for nvidia-smi id 2.

This appears to be very consistent between reboots.
te
> 
> If you don't set CUDA_VISIBLE_DEVICES, cuda* objects are basically 
> out-of-order. nvml objects are (a bit less likely) ordered by PCI bus is 
> (lstopo -v would confirm that).

Yes the nvml and what is ordering is by ascending PCI ID,  nvidia-smi shows 
this:

[root@nyx7500 ~]# nvidia-smi | grep Tesla
|   0  Tesla K20Xm         Off  | 0000:09:00.0     Off |                    0 |
|   1  Tesla K20Xm         Off  | 0000:0A:00.0     Off |                    0 |
|   2  Tesla K20Xm         Off  | 0000:0D:00.0     Off |                    0 |
|   3  Tesla K20Xm         Off  | 0000:0E:00.0     Off |                    0 |
|   4  Tesla K20Xm         Off  | 0000:28:00.0     Off |                    0 |
|   5  Tesla K20Xm         Off  | 0000:2B:00.0     Off |                    0 |
|   6  Tesla K20Xm         Off  | 0000:30:00.0     Off |                    0 |
|   7  Tesla K20Xm         Off  | 0000:33:00.0     Off |                    0 |

[root@nyx7500 ~]# lstopo -v
Machine (P#0 total=67073288KB DMIProductName="ProLiant SL270s Gen8   " 
DMIProductVersion= DMIProductSerial="USE3267A92      " 
DMIProductUUID=36353439-3437-5553-4533-323637413932 DMIBoardVendor=HP 
DMIBoardName= DMIBoardVersion= DMIBoardSerial="USE3267A92      " 
DMIBoardAssetTag="        " DMIChassisVendor=HP DMIChassisType=25 
DMIChassisVersion= DMIChassisSerial="USE3267A90      " DMIChassisAssetTag="     
   " DMIBIOSVendor=HP DMIBIOSVersion=P75 DMIBIOSDate=09/18/2013 DMISysVendor=HP 
Backend=Linux LinuxCgroup=/ OSName=Linux OSRelease=2.6.32-358.23.2.el6.x86_64 
OSVersion="#1 SMP Sat Sep 14 05:32:37 EDT 2013" 
HostName=nyx7500.engin.umich.edu Architecture=x86_64)
  NUMANode L#0 (P#0 local=33518860KB total=33518860KB)
    Socket L#0 (P#0 CPUModel="Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz" 
CPUVendor=GenuineIntel CPUModelNumber=45 CPUFamilyNumber=6)
      L3Cache L#0 (size=20480KB linesize=64 ways=20)
        L2Cache L#0 (size=256KB linesize=64 ways=8)
          L1dCache L#0 (size=32KB linesize=64 ways=8)
            L1iCache L#0 (size=32KB linesize=64 ways=8)
              Core L#0 (P#0)
                PU L#0 (P#0)
        L2Cache L#1 (size=256KB linesize=64 ways=8)
          L1dCache L#1 (size=32KB linesize=64 ways=8)
            L1iCache L#1 (size=32KB linesize=64 ways=8)
              Core L#1 (P#1)
                PU L#1 (P#1)
        L2Cache L#2 (size=256KB linesize=64 ways=8)
          L1dCache L#2 (size=32KB linesize=64 ways=8)
            L1iCache L#2 (size=32KB linesize=64 ways=8)
              Core L#2 (P#2)
                PU L#2 (P#2)
        L2Cache L#3 (size=256KB linesize=64 ways=8)
          L1dCache L#3 (size=32KB linesize=64 ways=8)
            L1iCache L#3 (size=32KB linesize=64 ways=8)
              Core L#3 (P#3)
                PU L#3 (P#3)
        L2Cache L#4 (size=256KB linesize=64 ways=8)
          L1dCache L#4 (size=32KB linesize=64 ways=8)
            L1iCache L#4 (size=32KB linesize=64 ways=8)
              Core L#4 (P#4)
                PU L#4 (P#4)
        L2Cache L#5 (size=256KB linesize=64 ways=8)
          L1dCache L#5 (size=32KB linesize=64 ways=8)
            L1iCache L#5 (size=32KB linesize=64 ways=8)
              Core L#5 (P#5)
                PU L#5 (P#5)
        L2Cache L#6 (size=256KB linesize=64 ways=8)
          L1dCache L#6 (size=32KB linesize=64 ways=8)
            L1iCache L#6 (size=32KB linesize=64 ways=8)
              Core L#6 (P#6)
                PU L#6 (P#6)
        L2Cache L#7 (size=256KB linesize=64 ways=8)
          L1dCache L#7 (size=32KB linesize=64 ways=8)
            L1iCache L#7 (size=32KB linesize=64 ways=8)
              Core L#7 (P#7)
                PU L#7 (P#7)
    Bridge Host->PCI L#0 (P#0 buses=0000:[00-14])
      Bridge PCI->PCI (P#16 busid=0000:00:01.0 id=8086:3c02 class=0604(PCI_B) 
link=2.00GB/s buses=0000:[05-05])
        PCI 1000:0087 (P#20480 busid=0000:05:00.0 class=0107(SAS) link=2.00GB/s)
          Block L#0 "sda"
          Block L#1 "sdb"
      Bridge PCI->PCI (P#32 busid=0000:00:02.0 id=8086:3c04 class=0604(PCI_B) 
link=15.75GB/s buses=0000:[0b-0e])
        Bridge PCI->PCI (P#45056 busid=0000:0b:00.0 id=10b5:8747 
class=0604(PCI_B) link=15.75GB/s buses=0000:[0c-0e])
          Bridge PCI->PCI (P#49280 busid=0000:0c:08.0 id=10b5:8747 
class=0604(PCI_B) link=4.00GB/s buses=0000:[0d-0d])
            PCI 10de:1021 (P#53248 busid=0000:0d:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#2 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda0"
              GPU L#3 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039409 
NVIDIAUUID=GPU-ce438227-9e75-de70-22ea-37dbe4de5219) "nvml2"
          Bridge PCI->PCI (P#49408 busid=0000:0c:10.0 id=10b5:8747 
class=0604(PCI_B) link=4.00GB/s buses=0000:[0e-0e])
            PCI 10de:1021 (P#57344 busid=0000:0e:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#4 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda1"
              GPU L#5 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039509 
NVIDIAUUID=GPU-1079ef10-bf05-a0bc-c942-5f6a650b1691) "nvml3"
      Bridge PCI->PCI (P#48 busid=0000:00:03.0 id=8086:3c08 class=0604(PCI_B) 
link=15.75GB/s buses=0000:[07-0a])
        Bridge PCI->PCI (P#28672 busid=0000:07:00.0 id=10b5:8747 
class=0604(PCI_B) link=15.75GB/s buses=0000:[08-0a])
          Bridge PCI->PCI (P#32896 busid=0000:08:08.0 id=10b5:8747 
class=0604(PCI_B) link=8.00GB/s buses=0000:[09-09])
            PCI 10de:1021 (P#36864 busid=0000:09:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#6 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda2"
              GPU L#7 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039709 
NVIDIAUUID=GPU-185e845c-0887-501c-75e2-0d025c651910) "nvml0"
          Bridge PCI->PCI (P#33024 busid=0000:08:10.0 id=10b5:8747 
class=0604(PCI_B) link=8.00GB/s buses=0000:[0a-0a])
            PCI 10de:1021 (P#40960 busid=0000:0a:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#8 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda3"
              GPU L#9 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039717 
NVIDIAUUID=GPU-f13fa871-57ce-47b8-a6c3-c8d35efa686d) "nvml1"
      Bridge PCI->PCI (P#448 busid=0000:00:1c.0 id=8086:1d10 class=0604(PCI_B) 
link=2.00GB/s buses=0000:[02-02])
        PCI 8086:1521 (P#8192 busid=0000:02:00.0 class=0200(Ether) 
link=2.00GB/s)
          Network L#10 (Address=c8:cb:b8:cd:18:4a) "eth0"
        PCI 8086:1521 (P#8193 busid=0000:02:00.1 class=0200(Ether) 
link=2.00GB/s)
          Network L#11 (Address=c8:cb:b8:cd:18:4b) "eth1"
      Bridge PCI->PCI (P#455 busid=0000:00:1c.7 id=8086:1d1e class=0604(PCI_B) 
link=0.25GB/s buses=0000:[01-01])
        PCI 102b:0533 (P#4097 busid=0000:01:00.1 class=0300(VGA) link=0.25GB/s)
      PCI 8086:1d02 (P#498 busid=0000:00:1f.2 class=0106(SATA))
  NUMANode L#1 (P#1 local=33554428KB total=33554428KB)
    Socket L#1 (P#1 CPUModel="Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz" 
CPUVendor=GenuineIntel CPUModelNumber=45 CPUFamilyNumber=6)
      L3Cache L#1 (size=20480KB linesize=64 ways=20)
        L2Cache L#8 (size=256KB linesize=64 ways=8)
          L1dCache L#8 (size=32KB linesize=64 ways=8)
            L1iCache L#8 (size=32KB linesize=64 ways=8)
              Core L#8 (P#0)
                PU L#8 (P#8)
        L2Cache L#9 (size=256KB linesize=64 ways=8)
          L1dCache L#9 (size=32KB linesize=64 ways=8)
            L1iCache L#9 (size=32KB linesize=64 ways=8)
              Core L#9 (P#1)
                PU L#9 (P#9)
        L2Cache L#10 (size=256KB linesize=64 ways=8)
          L1dCache L#10 (size=32KB linesize=64 ways=8)
            L1iCache L#10 (size=32KB linesize=64 ways=8)
              Core L#10 (P#2)
                PU L#10 (P#10)
        L2Cache L#11 (size=256KB linesize=64 ways=8)
          L1dCache L#11 (size=32KB linesize=64 ways=8)
            L1iCache L#11 (size=32KB linesize=64 ways=8)
              Core L#11 (P#3)
                PU L#11 (P#11)
        L2Cache L#12 (size=256KB linesize=64 ways=8)
          L1dCache L#12 (size=32KB linesize=64 ways=8)
            L1iCache L#12 (size=32KB linesize=64 ways=8)
              Core L#12 (P#4)
                PU L#12 (P#12)
        L2Cache L#13 (size=256KB linesize=64 ways=8)
          L1dCache L#13 (size=32KB linesize=64 ways=8)
            L1iCache L#13 (size=32KB linesize=64 ways=8)
              Core L#13 (P#5)
                PU L#13 (P#13)
        L2Cache L#14 (size=256KB linesize=64 ways=8)
          L1dCache L#14 (size=32KB linesize=64 ways=8)
            L1iCache L#14 (size=32KB linesize=64 ways=8)
              Core L#14 (P#6)
                PU L#14 (P#14)
        L2Cache L#15 (size=256KB linesize=64 ways=8)
          L1dCache L#15 (size=32KB linesize=64 ways=8)
            L1iCache L#15 (size=32KB linesize=64 ways=8)
              Core L#15 (P#7)
                PU L#15 (P#15)
    Bridge Host->PCI L#12 (P#1 buses=0000:[20-3d])
      Bridge PCI->PCI (P#131088 busid=0000:20:01.0 id=8086:3c02 
class=0604(PCI_B) link=7.88GB/s buses=0000:[21-25])
        Bridge PCI->PCI (P#135168 busid=0000:21:00.0 id=10b5:8724 
class=0604(PCI_B) link=7.88GB/s buses=0000:[22-25])
          Bridge PCI->PCI (P#139280 busid=0000:22:01.0 id=10b5:8724 
class=0604(PCI_B) link=7.88GB/s buses=0000:[23-23])
            PCI 15b3:1003 (P#143360 busid=0000:23:00.0 class=0280(Net) 
link=7.88GB/s)
              Network L#12 (Address=24:be:05:8b:e4:e2 Port=2) "eth2"
              Network L#13 
(Address=80:00:00:49:fe:80:00:00:00:00:00:00:24:be:05:ff:ff:8b:e4:e1 Port=1) 
"ib0"
              Network L#14 (Address=06:00:00:00:03:29 Port=1) "eoib0"
              OpenFabrics L#15 (NodeGUID=24be:05ff:ff8b:e4e0 
SysImageGUID=24be:05ff:ff8b:e4e3 Port1State=4 Port1LID=0x2f8 Port1LMC=0 
Port1GID0=fe80:0000:0000:0000:24be:05ff:ff8b:e4e1 Port2State=1 Port2LID=0x0 
Port2LMC=0 Port2GID0=fe80:0000:0000:0000:26be:05ff:fe8b:e4e2) "mlx4_0"
      Bridge PCI->PCI (P#131104 busid=0000:20:02.0 id=8086:3c04 
class=0604(PCI_B) link=15.75GB/s buses=0000:[26-2d])
        Bridge PCI->PCI (P#155648 busid=0000:26:00.0 id=10b5:8747 
class=0604(PCI_B) link=15.75GB/s buses=0000:[27-2d])
          Bridge PCI->PCI (P#159872 busid=0000:27:08.0 id=10b5:8747 
class=0604(PCI_B) link=4.00GB/s buses=0000:[28-28])
            PCI 10de:1021 (P#163840 busid=0000:28:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#16 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda4"
              GPU L#17 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039422 
NVIDIAUUID=GPU-89053185-7a14-cdc7-c89f-9a69b64cef0a) "nvml4"
          Bridge PCI->PCI (P#160000 busid=0000:27:10.0 id=10b5:8747 
class=0604(PCI_B) link=4.00GB/s buses=0000:[2b-2b])
            PCI 10de:1021 (P#176128 busid=0000:2b:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#18 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda5"
              GPU L#19 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039702 
NVIDIAUUID=GPU-20a32c55-de79-c7b0-74ed-cbbc9fc2bfee) "nvml5"
      Bridge PCI->PCI (P#131120 busid=0000:20:03.0 id=8086:3c08 
class=0604(PCI_B) link=15.75GB/s buses=0000:[2e-35])
        Bridge PCI->PCI (P#188416 busid=0000:2e:00.0 id=10b5:8747 
class=0604(PCI_B) link=15.75GB/s buses=0000:[2f-35])
          Bridge PCI->PCI (P#192640 busid=0000:2f:08.0 id=10b5:8747 
class=0604(PCI_B) link=4.00GB/s buses=0000:[30-30])
            PCI 10de:1021 (P#196608 busid=0000:30:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#20 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda6"
              GPU L#21 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039633 
NVIDIAUUID=GPU-d24b7e36-3a28-f787-4497-c43356a7ff2d) "nvml6"
          Bridge PCI->PCI (P#192768 busid=0000:2f:10.0 id=10b5:8747 
class=0604(PCI_B) link=4.00GB/s buses=0000:[33-33])
            PCI 10de:1021 (P#208896 busid=0000:33:00.0 class=0302(3D) 
link=8.00GB/s)
              Co-Processor L#22 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA 
Corporation" GPUModel="Tesla K20Xm") "cuda7"
              GPU L#23 (Backend=NVML GPUVendor="NVIDIA Corporation" 
GPUModel="Tesla K20Xm" NVIDIASerial=0320413039548 
NVIDIAUUID=GPU-01fa129f-f63c-2542-d9fc-ad6dfe3e9467) "nvml7"
depth 0:        1 Machine (type #1)
 depth 1:       2 NUMANode (type #2)
  depth 2:      2 Socket (type #3)
   depth 3:     2 L3Cache (type #4)
    depth 4:    16 L2Cache (type #4)
     depth 5:   16 L1dCache (type #4)
      depth 6:  16 L1iCache (type #4)
       depth 7: 16 Core (type #5)
        depth 8:        16 PU (type #6)
Special depth -3:       24 Bridge (type #9)
Special depth -4:       14 PCI Device (type #10)
Special depth -5:       24 OS Device (type #11)
latency matrix between NUMANodes (depth 1) by logical indexes:
  index     0     1
      0 1.000 2.000
      1 2.000 1.000




> 
> Brice
> 
> 
> 
>> 
>>       PCIBridge
>>         PCIBridge
>>           PCIBridge
>>             PCI 10de:1021
>>               CoProc L#2 "cuda0"
>>               GPU L#3 "nvml2"
>>           PCIBridge
>>             PCI 10de:1021
>>               CoProc L#4 "cuda1"
>>               GPU L#5 "nvml3"
>>       PCIBridge
>>         PCIBridge
>>           PCIBridge
>>             PCI 10de:1021
>>               CoProc L#6 "cuda2"
>>               GPU L#7 "nvml0"
>>           PCIBridge
>>             PCI 10de:1021
>>               CoProc L#8 "cuda3"
>>               GPU L#9 "nvml1"
>> 
>> 
>> Right now I am trying to create a python script that will take the XML 
>> output of lstopo and give me just the cuda and nvml devices in order. 
>> 
>> I dont' know if some value are deterministic though.  Could I  ignore the 
>> CoProc line and just use the:
>> 
>>               GPU L#3 "nvml2"
>>               GPU L#5 "nvml3"
>>               GPU L#7 "nvml0"
>>               GPU L#9 "nvml1"
>> 
>> Is the L# always going to be in the oder I would expect?  Because then I 
>> already have my map then. 
>> 
> 
> 
> 
> Brice
> 
> 
>> 
>> Brock Palen
>> 
>> www.umich.edu/~brockp
>> 
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> 
>> bro...@umich.edu
>> 
>> (734)936-1985
>> 
>> 
>> 
>> On Feb 5, 2014, at 1:19 AM, Brice Goglin 
>> <brice.gog...@inria.fr>
>>  wrote:
>> 
>> 
>>> Hello Brock,
>>> 
>>> Some people reported the same issue in the past and that's why we added the 
>>> "nvml" objects. CUDA reorders devices by "performance". Batch-schedulers 
>>> are somehow supposed to use "nvml" for managing GPUs without actually using 
>>> them with CUDA directly. And the "nvml" order is the "normal" order.
>>> 
>>> You need "tdk" (
>>> https://developer.nvidia.com/tesla-deployment-kit
>>> ) to get nvml library and development headers installed. Then hwloc can 
>>> build its "nvml" backend. Once ready, you'll see a hwloc "cudaX" and a 
>>> hwloc "nvmlY" object in each NVIDIA PCI devices, and you can get their 
>>> locality as usual.
>>> 
>>> Does this help?
>>> 
>>> Brice
>>> 
>>> 
>>> 
>>> Le 05/02/2014 05:25, Brock Palen a écrit :
>>> 
>>>> We are trying to build a system to mask users to the GPU's they were 
>>>> assigned by our batch system (torque).
>>>> 
>>>> The batch system sets the GPU's into thread exclusive mode when assigned 
>>>> to a job, so we want the GPU that the batch system assigns to be the one 
>>>> set in CUDA_VISIBLE_DEVICES,
>>>> 
>>>> Problem is on our nodes what the batch system sees as gpu 0  is not the 
>>>> same GPU that CUDA_VISIBLE_DEVICES sees as 0.   Actually 0  is 2.
>>>> 
>>>> You can see this behavior is you run 
>>>> 
>>>> nvidia-smi  and look at the PCI ID's of the devices.  You can then look at 
>>>> the PCI ID's outputed by deviceQuery from the SDK examples and see they 
>>>> are in a different order.
>>>> 
>>>> The ID's you would set in CUDA_VISIBLE_DEVICES matches the order that 
>>>> deviceQuery sees, not the order that nvida-smi sees.
>>>> 
>>>> Example (All values turned to decimal to match deviceQuery):
>>>> 
>>>> nvidia-smi order: 9, 10, 13, 14, 40, 43, 48, 51
>>>> dviceQuery order: 13, 14, 9, 10, 40, 43, 48, 51
>>>> 
>>>> 
>>>> Can hwloc help me with this?  Right now I am hacking a script based on the 
>>>> output of the two commands, and making a map, between the two and then set 
>>>> CUDA_VISIBLE_DEVICES
>>>> 
>>>> Any ideas would be great. Later as we currently also use CPU sets, we want 
>>>> to pass GPU locality information to the scheduler to make decisions to 
>>>> match GPU-> CPU Socket information, as performance of threads across QPI 
>>>> domains is very poor. 
>>>> 
>>>> Thanks
>>>> 
>>>> Machine (64GB)
>>>>   NUMANode L#0 (P#0 32GB)
>>>>     Socket L#0 + L3 L#0 (20MB)
>>>>       L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 
>>>> (P#0)
>>>>       L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 
>>>> (P#1)
>>>>       L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 
>>>> (P#2)
>>>>       L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 
>>>> (P#3)
>>>>       L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 
>>>> (P#4)
>>>>       L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 
>>>> (P#5)
>>>>       L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 
>>>> (P#6)
>>>>       L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 
>>>> (P#7)
>>>>     HostBridge L#0
>>>>       PCIBridge
>>>>         PCI 1000:0087
>>>>           Block L#0 "sda"
>>>>           Block L#1 "sdb"
>>>>       PCIBridge
>>>>         PCIBridge
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#2 "cuda0"
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#3 "cuda1"
>>>>       PCIBridge
>>>>         PCIBridge
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#4 "cuda2"
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#5 "cuda3"
>>>>       PCIBridge
>>>>         PCI 8086:1521
>>>>           Net L#6 "eth0"
>>>>         PCI 8086:1521
>>>>           Net L#7 "eth1"
>>>>       PCIBridge
>>>>         PCI 102b:0533
>>>>       PCI 8086:1d02
>>>>   NUMANode L#1 (P#1 32GB)
>>>>     Socket L#1 + L3 L#1 (20MB)
>>>>       L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 
>>>> (P#8)
>>>>       L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 
>>>> (P#9)
>>>>       L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU 
>>>> L#10 (P#10)
>>>>       L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU 
>>>> L#11 (P#11)
>>>>       L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU 
>>>> L#12 (P#12)
>>>>       L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU 
>>>> L#13 (P#13)
>>>>       L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU 
>>>> L#14 (P#14)
>>>>       L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU 
>>>> L#15 (P#15)
>>>>     HostBridge L#12
>>>>       PCIBridge
>>>>         PCIBridge
>>>>           PCIBridge
>>>>             PCI 15b3:1003
>>>>               Net L#8 "eth2"
>>>>               Net L#9 "ib0"
>>>>               Net L#10 "eoib0"
>>>>               OpenFabrics L#11 "mlx4_0"
>>>>       PCIBridge
>>>>         PCIBridge
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#12 "cuda4"
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#13 "cuda5"
>>>>       PCIBridge
>>>>         PCIBridge
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#14 "cuda6"
>>>>           PCIBridge
>>>>             PCI 10de:1021
>>>>               CoProc L#15 "cuda7"
>>>> 
>>>> 
>>>> Brock Palen
>>>> 
>>>> 
>>>> www.umich.edu/~brockp
>>>> 
>>>> 
>>>> CAEN Advanced Computing
>>>> XSEDE Campus Champion
>>>> 
>>>> 
>>>> bro...@umich.edu
>>>> 
>>>> 
>>>> (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> hwloc-users mailing list
>>>> 
>>>> 
>>>> hwloc-us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>> _______________________________________________
>>> hwloc-users mailing list
>>> 
>>> hwloc-us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> 
>> 
>> _______________________________________________
>> hwloc-users mailing list
>> 
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> 
> _______________________________________________
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [hwloc-users] Using hwloc to map GPU layout on system

Reply via email to