On Feb 7, 2014, at 9:45 AM, Brice Goglin <brice.gog...@inria.fr> wrote:
> Le 06/02/2014 21:31, Brock Palen a écrit : >> Actually that did turn out to help. The nvml# devices appear to be numbered >> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are >> in the order that PBS and nvidia-smi see them. > > By the way, did you have CUDA_VISIBLE_DEVICES set during the lstopo below? > Was it set to 2,3,0,1 ? That would explain the reordering. It was not set, and I have double checked it just now to be sure. > > I am not sure in which order you want to do things in the end. One way that > could help is: > * Get the locality of each GPU by doing CUDA_VISIBLE_DEVICES=x (for x in > 0..number of gpus-1). Each iteration gives a single GPU in hwloc, and you can > retrieve the corresponding locality from the cuda0 object. > * Once you know which GPUs you want based on the locality info, take the > corresponding #x and put them in CUDA_VISIBLE_DEVICES=x,y before you run your > program. hwloc will create cuda0 for x and cuda1 for y. The cuda ID's match the order if you run nvidia-smi (which gives you PCI addresses) The nvml id's match the order in which they start. That is CUDA_VISIBLE_DEVICES=0, cudaSetDevice(0) matches nvml0 which matches id 2 for CoProc cuda2 and for nvidia-smi id 2. This appears to be very consistent between reboots. te > > If you don't set CUDA_VISIBLE_DEVICES, cuda* objects are basically > out-of-order. nvml objects are (a bit less likely) ordered by PCI bus is > (lstopo -v would confirm that). Yes the nvml and what is ordering is by ascending PCI ID, nvidia-smi shows this: [root@nyx7500 ~]# nvidia-smi | grep Tesla | 0 Tesla K20Xm Off | 0000:09:00.0 Off | 0 | | 1 Tesla K20Xm Off | 0000:0A:00.0 Off | 0 | | 2 Tesla K20Xm Off | 0000:0D:00.0 Off | 0 | | 3 Tesla K20Xm Off | 0000:0E:00.0 Off | 0 | | 4 Tesla K20Xm Off | 0000:28:00.0 Off | 0 | | 5 Tesla K20Xm Off | 0000:2B:00.0 Off | 0 | | 6 Tesla K20Xm Off | 0000:30:00.0 Off | 0 | | 7 Tesla K20Xm Off | 0000:33:00.0 Off | 0 | [root@nyx7500 ~]# lstopo -v Machine (P#0 total=67073288KB DMIProductName="ProLiant SL270s Gen8 " DMIProductVersion= DMIProductSerial="USE3267A92 " DMIProductUUID=36353439-3437-5553-4533-323637413932 DMIBoardVendor=HP DMIBoardName= DMIBoardVersion= DMIBoardSerial="USE3267A92 " DMIBoardAssetTag=" " DMIChassisVendor=HP DMIChassisType=25 DMIChassisVersion= DMIChassisSerial="USE3267A90 " DMIChassisAssetTag=" " DMIBIOSVendor=HP DMIBIOSVersion=P75 DMIBIOSDate=09/18/2013 DMISysVendor=HP Backend=Linux LinuxCgroup=/ OSName=Linux OSRelease=2.6.32-358.23.2.el6.x86_64 OSVersion="#1 SMP Sat Sep 14 05:32:37 EDT 2013" HostName=nyx7500.engin.umich.edu Architecture=x86_64) NUMANode L#0 (P#0 local=33518860KB total=33518860KB) Socket L#0 (P#0 CPUModel="Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz" CPUVendor=GenuineIntel CPUModelNumber=45 CPUFamilyNumber=6) L3Cache L#0 (size=20480KB linesize=64 ways=20) L2Cache L#0 (size=256KB linesize=64 ways=8) L1dCache L#0 (size=32KB linesize=64 ways=8) L1iCache L#0 (size=32KB linesize=64 ways=8) Core L#0 (P#0) PU L#0 (P#0) L2Cache L#1 (size=256KB linesize=64 ways=8) L1dCache L#1 (size=32KB linesize=64 ways=8) L1iCache L#1 (size=32KB linesize=64 ways=8) Core L#1 (P#1) PU L#1 (P#1) L2Cache L#2 (size=256KB linesize=64 ways=8) L1dCache L#2 (size=32KB linesize=64 ways=8) L1iCache L#2 (size=32KB linesize=64 ways=8) Core L#2 (P#2) PU L#2 (P#2) L2Cache L#3 (size=256KB linesize=64 ways=8) L1dCache L#3 (size=32KB linesize=64 ways=8) L1iCache L#3 (size=32KB linesize=64 ways=8) Core L#3 (P#3) PU L#3 (P#3) L2Cache L#4 (size=256KB linesize=64 ways=8) L1dCache L#4 (size=32KB linesize=64 ways=8) L1iCache L#4 (size=32KB linesize=64 ways=8) Core L#4 (P#4) PU L#4 (P#4) L2Cache L#5 (size=256KB linesize=64 ways=8) L1dCache L#5 (size=32KB linesize=64 ways=8) L1iCache L#5 (size=32KB linesize=64 ways=8) Core L#5 (P#5) PU L#5 (P#5) L2Cache L#6 (size=256KB linesize=64 ways=8) L1dCache L#6 (size=32KB linesize=64 ways=8) L1iCache L#6 (size=32KB linesize=64 ways=8) Core L#6 (P#6) PU L#6 (P#6) L2Cache L#7 (size=256KB linesize=64 ways=8) L1dCache L#7 (size=32KB linesize=64 ways=8) L1iCache L#7 (size=32KB linesize=64 ways=8) Core L#7 (P#7) PU L#7 (P#7) Bridge Host->PCI L#0 (P#0 buses=0000:[00-14]) Bridge PCI->PCI (P#16 busid=0000:00:01.0 id=8086:3c02 class=0604(PCI_B) link=2.00GB/s buses=0000:[05-05]) PCI 1000:0087 (P#20480 busid=0000:05:00.0 class=0107(SAS) link=2.00GB/s) Block L#0 "sda" Block L#1 "sdb" Bridge PCI->PCI (P#32 busid=0000:00:02.0 id=8086:3c04 class=0604(PCI_B) link=15.75GB/s buses=0000:[0b-0e]) Bridge PCI->PCI (P#45056 busid=0000:0b:00.0 id=10b5:8747 class=0604(PCI_B) link=15.75GB/s buses=0000:[0c-0e]) Bridge PCI->PCI (P#49280 busid=0000:0c:08.0 id=10b5:8747 class=0604(PCI_B) link=4.00GB/s buses=0000:[0d-0d]) PCI 10de:1021 (P#53248 busid=0000:0d:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#2 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda0" GPU L#3 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039409 NVIDIAUUID=GPU-ce438227-9e75-de70-22ea-37dbe4de5219) "nvml2" Bridge PCI->PCI (P#49408 busid=0000:0c:10.0 id=10b5:8747 class=0604(PCI_B) link=4.00GB/s buses=0000:[0e-0e]) PCI 10de:1021 (P#57344 busid=0000:0e:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#4 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda1" GPU L#5 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039509 NVIDIAUUID=GPU-1079ef10-bf05-a0bc-c942-5f6a650b1691) "nvml3" Bridge PCI->PCI (P#48 busid=0000:00:03.0 id=8086:3c08 class=0604(PCI_B) link=15.75GB/s buses=0000:[07-0a]) Bridge PCI->PCI (P#28672 busid=0000:07:00.0 id=10b5:8747 class=0604(PCI_B) link=15.75GB/s buses=0000:[08-0a]) Bridge PCI->PCI (P#32896 busid=0000:08:08.0 id=10b5:8747 class=0604(PCI_B) link=8.00GB/s buses=0000:[09-09]) PCI 10de:1021 (P#36864 busid=0000:09:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#6 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda2" GPU L#7 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039709 NVIDIAUUID=GPU-185e845c-0887-501c-75e2-0d025c651910) "nvml0" Bridge PCI->PCI (P#33024 busid=0000:08:10.0 id=10b5:8747 class=0604(PCI_B) link=8.00GB/s buses=0000:[0a-0a]) PCI 10de:1021 (P#40960 busid=0000:0a:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#8 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda3" GPU L#9 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039717 NVIDIAUUID=GPU-f13fa871-57ce-47b8-a6c3-c8d35efa686d) "nvml1" Bridge PCI->PCI (P#448 busid=0000:00:1c.0 id=8086:1d10 class=0604(PCI_B) link=2.00GB/s buses=0000:[02-02]) PCI 8086:1521 (P#8192 busid=0000:02:00.0 class=0200(Ether) link=2.00GB/s) Network L#10 (Address=c8:cb:b8:cd:18:4a) "eth0" PCI 8086:1521 (P#8193 busid=0000:02:00.1 class=0200(Ether) link=2.00GB/s) Network L#11 (Address=c8:cb:b8:cd:18:4b) "eth1" Bridge PCI->PCI (P#455 busid=0000:00:1c.7 id=8086:1d1e class=0604(PCI_B) link=0.25GB/s buses=0000:[01-01]) PCI 102b:0533 (P#4097 busid=0000:01:00.1 class=0300(VGA) link=0.25GB/s) PCI 8086:1d02 (P#498 busid=0000:00:1f.2 class=0106(SATA)) NUMANode L#1 (P#1 local=33554428KB total=33554428KB) Socket L#1 (P#1 CPUModel="Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz" CPUVendor=GenuineIntel CPUModelNumber=45 CPUFamilyNumber=6) L3Cache L#1 (size=20480KB linesize=64 ways=20) L2Cache L#8 (size=256KB linesize=64 ways=8) L1dCache L#8 (size=32KB linesize=64 ways=8) L1iCache L#8 (size=32KB linesize=64 ways=8) Core L#8 (P#0) PU L#8 (P#8) L2Cache L#9 (size=256KB linesize=64 ways=8) L1dCache L#9 (size=32KB linesize=64 ways=8) L1iCache L#9 (size=32KB linesize=64 ways=8) Core L#9 (P#1) PU L#9 (P#9) L2Cache L#10 (size=256KB linesize=64 ways=8) L1dCache L#10 (size=32KB linesize=64 ways=8) L1iCache L#10 (size=32KB linesize=64 ways=8) Core L#10 (P#2) PU L#10 (P#10) L2Cache L#11 (size=256KB linesize=64 ways=8) L1dCache L#11 (size=32KB linesize=64 ways=8) L1iCache L#11 (size=32KB linesize=64 ways=8) Core L#11 (P#3) PU L#11 (P#11) L2Cache L#12 (size=256KB linesize=64 ways=8) L1dCache L#12 (size=32KB linesize=64 ways=8) L1iCache L#12 (size=32KB linesize=64 ways=8) Core L#12 (P#4) PU L#12 (P#12) L2Cache L#13 (size=256KB linesize=64 ways=8) L1dCache L#13 (size=32KB linesize=64 ways=8) L1iCache L#13 (size=32KB linesize=64 ways=8) Core L#13 (P#5) PU L#13 (P#13) L2Cache L#14 (size=256KB linesize=64 ways=8) L1dCache L#14 (size=32KB linesize=64 ways=8) L1iCache L#14 (size=32KB linesize=64 ways=8) Core L#14 (P#6) PU L#14 (P#14) L2Cache L#15 (size=256KB linesize=64 ways=8) L1dCache L#15 (size=32KB linesize=64 ways=8) L1iCache L#15 (size=32KB linesize=64 ways=8) Core L#15 (P#7) PU L#15 (P#15) Bridge Host->PCI L#12 (P#1 buses=0000:[20-3d]) Bridge PCI->PCI (P#131088 busid=0000:20:01.0 id=8086:3c02 class=0604(PCI_B) link=7.88GB/s buses=0000:[21-25]) Bridge PCI->PCI (P#135168 busid=0000:21:00.0 id=10b5:8724 class=0604(PCI_B) link=7.88GB/s buses=0000:[22-25]) Bridge PCI->PCI (P#139280 busid=0000:22:01.0 id=10b5:8724 class=0604(PCI_B) link=7.88GB/s buses=0000:[23-23]) PCI 15b3:1003 (P#143360 busid=0000:23:00.0 class=0280(Net) link=7.88GB/s) Network L#12 (Address=24:be:05:8b:e4:e2 Port=2) "eth2" Network L#13 (Address=80:00:00:49:fe:80:00:00:00:00:00:00:24:be:05:ff:ff:8b:e4:e1 Port=1) "ib0" Network L#14 (Address=06:00:00:00:03:29 Port=1) "eoib0" OpenFabrics L#15 (NodeGUID=24be:05ff:ff8b:e4e0 SysImageGUID=24be:05ff:ff8b:e4e3 Port1State=4 Port1LID=0x2f8 Port1LMC=0 Port1GID0=fe80:0000:0000:0000:24be:05ff:ff8b:e4e1 Port2State=1 Port2LID=0x0 Port2LMC=0 Port2GID0=fe80:0000:0000:0000:26be:05ff:fe8b:e4e2) "mlx4_0" Bridge PCI->PCI (P#131104 busid=0000:20:02.0 id=8086:3c04 class=0604(PCI_B) link=15.75GB/s buses=0000:[26-2d]) Bridge PCI->PCI (P#155648 busid=0000:26:00.0 id=10b5:8747 class=0604(PCI_B) link=15.75GB/s buses=0000:[27-2d]) Bridge PCI->PCI (P#159872 busid=0000:27:08.0 id=10b5:8747 class=0604(PCI_B) link=4.00GB/s buses=0000:[28-28]) PCI 10de:1021 (P#163840 busid=0000:28:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#16 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda4" GPU L#17 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039422 NVIDIAUUID=GPU-89053185-7a14-cdc7-c89f-9a69b64cef0a) "nvml4" Bridge PCI->PCI (P#160000 busid=0000:27:10.0 id=10b5:8747 class=0604(PCI_B) link=4.00GB/s buses=0000:[2b-2b]) PCI 10de:1021 (P#176128 busid=0000:2b:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#18 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda5" GPU L#19 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039702 NVIDIAUUID=GPU-20a32c55-de79-c7b0-74ed-cbbc9fc2bfee) "nvml5" Bridge PCI->PCI (P#131120 busid=0000:20:03.0 id=8086:3c08 class=0604(PCI_B) link=15.75GB/s buses=0000:[2e-35]) Bridge PCI->PCI (P#188416 busid=0000:2e:00.0 id=10b5:8747 class=0604(PCI_B) link=15.75GB/s buses=0000:[2f-35]) Bridge PCI->PCI (P#192640 busid=0000:2f:08.0 id=10b5:8747 class=0604(PCI_B) link=4.00GB/s buses=0000:[30-30]) PCI 10de:1021 (P#196608 busid=0000:30:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#20 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda6" GPU L#21 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039633 NVIDIAUUID=GPU-d24b7e36-3a28-f787-4497-c43356a7ff2d) "nvml6" Bridge PCI->PCI (P#192768 busid=0000:2f:10.0 id=10b5:8747 class=0604(PCI_B) link=4.00GB/s buses=0000:[33-33]) PCI 10de:1021 (P#208896 busid=0000:33:00.0 class=0302(3D) link=8.00GB/s) Co-Processor L#22 (CoProcType=CUDA Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm") "cuda7" GPU L#23 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="Tesla K20Xm" NVIDIASerial=0320413039548 NVIDIAUUID=GPU-01fa129f-f63c-2542-d9fc-ad6dfe3e9467) "nvml7" depth 0: 1 Machine (type #1) depth 1: 2 NUMANode (type #2) depth 2: 2 Socket (type #3) depth 3: 2 L3Cache (type #4) depth 4: 16 L2Cache (type #4) depth 5: 16 L1dCache (type #4) depth 6: 16 L1iCache (type #4) depth 7: 16 Core (type #5) depth 8: 16 PU (type #6) Special depth -3: 24 Bridge (type #9) Special depth -4: 14 PCI Device (type #10) Special depth -5: 24 OS Device (type #11) latency matrix between NUMANodes (depth 1) by logical indexes: index 0 1 0 1.000 2.000 1 2.000 1.000 > > Brice > > > >> >> PCIBridge >> PCIBridge >> PCIBridge >> PCI 10de:1021 >> CoProc L#2 "cuda0" >> GPU L#3 "nvml2" >> PCIBridge >> PCI 10de:1021 >> CoProc L#4 "cuda1" >> GPU L#5 "nvml3" >> PCIBridge >> PCIBridge >> PCIBridge >> PCI 10de:1021 >> CoProc L#6 "cuda2" >> GPU L#7 "nvml0" >> PCIBridge >> PCI 10de:1021 >> CoProc L#8 "cuda3" >> GPU L#9 "nvml1" >> >> >> Right now I am trying to create a python script that will take the XML >> output of lstopo and give me just the cuda and nvml devices in order. >> >> I dont' know if some value are deterministic though. Could I ignore the >> CoProc line and just use the: >> >> GPU L#3 "nvml2" >> GPU L#5 "nvml3" >> GPU L#7 "nvml0" >> GPU L#9 "nvml1" >> >> Is the L# always going to be in the oder I would expect? Because then I >> already have my map then. >> > > > > Brice > > >> >> Brock Palen >> >> www.umich.edu/~brockp >> >> CAEN Advanced Computing >> XSEDE Campus Champion >> >> bro...@umich.edu >> >> (734)936-1985 >> >> >> >> On Feb 5, 2014, at 1:19 AM, Brice Goglin >> <brice.gog...@inria.fr> >> wrote: >> >> >>> Hello Brock, >>> >>> Some people reported the same issue in the past and that's why we added the >>> "nvml" objects. CUDA reorders devices by "performance". Batch-schedulers >>> are somehow supposed to use "nvml" for managing GPUs without actually using >>> them with CUDA directly. And the "nvml" order is the "normal" order. >>> >>> You need "tdk" ( >>> https://developer.nvidia.com/tesla-deployment-kit >>> ) to get nvml library and development headers installed. Then hwloc can >>> build its "nvml" backend. Once ready, you'll see a hwloc "cudaX" and a >>> hwloc "nvmlY" object in each NVIDIA PCI devices, and you can get their >>> locality as usual. >>> >>> Does this help? >>> >>> Brice >>> >>> >>> >>> Le 05/02/2014 05:25, Brock Palen a écrit : >>> >>>> We are trying to build a system to mask users to the GPU's they were >>>> assigned by our batch system (torque). >>>> >>>> The batch system sets the GPU's into thread exclusive mode when assigned >>>> to a job, so we want the GPU that the batch system assigns to be the one >>>> set in CUDA_VISIBLE_DEVICES, >>>> >>>> Problem is on our nodes what the batch system sees as gpu 0 is not the >>>> same GPU that CUDA_VISIBLE_DEVICES sees as 0. Actually 0 is 2. >>>> >>>> You can see this behavior is you run >>>> >>>> nvidia-smi and look at the PCI ID's of the devices. You can then look at >>>> the PCI ID's outputed by deviceQuery from the SDK examples and see they >>>> are in a different order. >>>> >>>> The ID's you would set in CUDA_VISIBLE_DEVICES matches the order that >>>> deviceQuery sees, not the order that nvida-smi sees. >>>> >>>> Example (All values turned to decimal to match deviceQuery): >>>> >>>> nvidia-smi order: 9, 10, 13, 14, 40, 43, 48, 51 >>>> dviceQuery order: 13, 14, 9, 10, 40, 43, 48, 51 >>>> >>>> >>>> Can hwloc help me with this? Right now I am hacking a script based on the >>>> output of the two commands, and making a map, between the two and then set >>>> CUDA_VISIBLE_DEVICES >>>> >>>> Any ideas would be great. Later as we currently also use CPU sets, we want >>>> to pass GPU locality information to the scheduler to make decisions to >>>> match GPU-> CPU Socket information, as performance of threads across QPI >>>> domains is very poor. >>>> >>>> Thanks >>>> >>>> Machine (64GB) >>>> NUMANode L#0 (P#0 32GB) >>>> Socket L#0 + L3 L#0 (20MB) >>>> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 >>>> (P#0) >>>> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 >>>> (P#1) >>>> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 >>>> (P#2) >>>> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 >>>> (P#3) >>>> L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 >>>> (P#4) >>>> L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 >>>> (P#5) >>>> L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 >>>> (P#6) >>>> L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 >>>> (P#7) >>>> HostBridge L#0 >>>> PCIBridge >>>> PCI 1000:0087 >>>> Block L#0 "sda" >>>> Block L#1 "sdb" >>>> PCIBridge >>>> PCIBridge >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#2 "cuda0" >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#3 "cuda1" >>>> PCIBridge >>>> PCIBridge >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#4 "cuda2" >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#5 "cuda3" >>>> PCIBridge >>>> PCI 8086:1521 >>>> Net L#6 "eth0" >>>> PCI 8086:1521 >>>> Net L#7 "eth1" >>>> PCIBridge >>>> PCI 102b:0533 >>>> PCI 8086:1d02 >>>> NUMANode L#1 (P#1 32GB) >>>> Socket L#1 + L3 L#1 (20MB) >>>> L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 >>>> (P#8) >>>> L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 >>>> (P#9) >>>> L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU >>>> L#10 (P#10) >>>> L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU >>>> L#11 (P#11) >>>> L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU >>>> L#12 (P#12) >>>> L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU >>>> L#13 (P#13) >>>> L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU >>>> L#14 (P#14) >>>> L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU >>>> L#15 (P#15) >>>> HostBridge L#12 >>>> PCIBridge >>>> PCIBridge >>>> PCIBridge >>>> PCI 15b3:1003 >>>> Net L#8 "eth2" >>>> Net L#9 "ib0" >>>> Net L#10 "eoib0" >>>> OpenFabrics L#11 "mlx4_0" >>>> PCIBridge >>>> PCIBridge >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#12 "cuda4" >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#13 "cuda5" >>>> PCIBridge >>>> PCIBridge >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#14 "cuda6" >>>> PCIBridge >>>> PCI 10de:1021 >>>> CoProc L#15 "cuda7" >>>> >>>> >>>> Brock Palen >>>> >>>> >>>> www.umich.edu/~brockp >>>> >>>> >>>> CAEN Advanced Computing >>>> XSEDE Campus Champion >>>> >>>> >>>> bro...@umich.edu >>>> >>>> >>>> (734)936-1985 >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> hwloc-users mailing list >>>> >>>> >>>> hwloc-us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>> _______________________________________________ >>> hwloc-users mailing list >>> >>> hwloc-us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> >> >> _______________________________________________ >> hwloc-users mailing list >> >> hwloc-us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
signature.asc
Description: Message signed with OpenPGP using GPGMail