bffffff... painful answer.

Is there anything easy that the administrators of the cluster could do? How could I persuade them that this is an easy task to do?
:)

Thanks,
Albert

On Thu 16 Feb 2012 14:18:07 GMT, Brice Goglin wrote:
Your machine has a buggy BIOS. It reports an empty locality info for PCI device. That's why hwloc cpuset is empty as well. I guess we should detect this case and return the entire machine cpuset instead.

Something like this should help:

Index: include/hwloc/cuda.h
===================================================================
--- include/hwloc/cuda.h (révision 4302)
+++ include/hwloc/cuda.h (copie de travail)
@@ -92,6 +92,8 @@
return -1;

hwloc_linux_parse_cpumap_file(sysfile, set);
+ if (hwloc_bitmap_iszero(set))
+ hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));

fclose(sysfile);
#else


Brice



Le 16/02/2012 15:09, Albert Solernou a écrit :
Hi Brice,
I attach a tar ball with the outputs.

It may be also relevant to specify that I am running hwloc on a cluster, so this is the output on a node with two GPU cards.

Thank you,
Albert

On 16/02/12 13:56, Brice Goglin wrote:
Hello Albert,
Does lstopo show PCI devices properly?
Can you send these outputs?
lstopo -.xml
and
for i in /sys/bus/pci/devices/* ; do echo -n "$i " ; cat
$i/local_cpus ; done
Brice



Le 16/02/2012 14:28, Albert Solernou a écrit :
Hi,
I am receiving cpuset 0x0 when I call hwloc_cuda_get_device_cpuset.
The exact output of tests/cuda.c is:
got cpuset 0x0 for device 0
got cpuset 0x0 for device 1


I have tried hwloc 1.3 and 1.4, using gnu and intel compilers. I am on
a ROCKS cluster, with two NVidia C2050 GPU cards,
Everything else seems to be working fine... What could I check for?
What information do you need to help me?

Thank you,
Albert


_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

Reply via email to