Before trying the instructions posted in http://developer.nvidia.com/ganglia-monitoring-system
on one of our Rocks 5.4.2 clusters that has 2 GPU cards in every compute node, I tried them out on a standalone linux workstation, running RHEL 6.2 (no Rocks). Notes from that attempt are posted here: http://sgowtham.net/blog/2012/02/11/ganglia-gmond-python-module-for-monitoring-nvidia-gpu/ Now that I know it works as explained, I'd like to try this out on the aforementioned Rocks 5.4.2 cluster with GPUs. Python bindings for the NVIDIA Management Library http://pypi.python.org/pypi/nvidia-ml-py/ requires Python to be newer than 2.4 - following Phil's instructions in a recent email, I got Python 2.7 and 3.x to install; and used that to get these Python bindings for NVML to install. I then followed the instructions in 'Ganglia/gmond python modules' page https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia 'nvidia_smi.py' and 'pynvml.py' were copied to /opt/ganglia/lib64/ganglia/python_modules/ and so on. For some reason, the Ganglia metrics do not include any GPU related information from the compute nodes. If any of you have tried this on your cluster and got it to work, I'd greatly appreciate some direction. Thanks for your time and help. Best, g -- Gowtham Information Technology Services Michigan Technological University (906) 487/3593 http://www.it.mtu.edu/ ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general