Before trying the instructions posted in

   http://developer.nvidia.com/ganglia-monitoring-system

on one of our Rocks 5.4.2 clusters that has 2 GPU
cards in every compute node, I tried them out on
a standalone linux workstation, running RHEL 6.2
(no Rocks). Notes from that attempt are posted
here:

   
http://sgowtham.net/blog/2012/02/11/ganglia-gmond-python-module-for-monitoring-nvidia-gpu/


Now that I know it works as explained, I'd like
to try this out on the aforementioned Rocks 5.4.2
cluster with GPUs.


Python bindings for the NVIDIA Management Library

   http://pypi.python.org/pypi/nvidia-ml-py/

requires Python to be newer than 2.4 - following
Phil's instructions in a recent email, I got
Python 2.7 and 3.x to install; and used that to
get these Python bindings for NVML to install.


I then followed the instructions in 'Ganglia/gmond
python modules' page

   https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia


'nvidia_smi.py' and 'pynvml.py' were copied to

   /opt/ganglia/lib64/ganglia/python_modules/

and so on.

For some reason, the Ganglia metrics do not include
any GPU related information from the compute nodes.

If any of you have tried this on your cluster and got it
to work, I'd greatly appreciate some direction.

Thanks for your time and help.

Best,
g

--
Gowtham
Information Technology Services
Michigan Technological University

(906) 487/3593
http://www.it.mtu.edu/


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to