Do you know if the metrics are actually being collected?

An easy way to test is to use netcat or telnet to connect to the
compute node with the nVidia card:

  nc node123 8649

That should dump a bunch of XML, and you can search that for the
metrics generated by the plugin.  If you find them there, check the
gmetad process similarly (note different port!):
  nc headnode 8651

Also check the web FE for simple per-metric charts (the boring grey
ones...) it's possible that the metrics are collected and sent, but
not rendering properly for some reason.



On Tue, Apr 19, 2016 at 6:15 PM, Jeff White <jeff.wh...@wsu.edu> wrote:
> I'm trying to get this nvidia module working on CentOS 7:
>
> https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia
>
> I did the following:
>
> * Installed CUDA on node but NOT on the Ganglia server
>
> * Installed nvidia-ml-py-7.352.0.tar.gz on both node and server
>
> * put nvidia.py into /usr/lib64/ganglia/python_modules/ on both node and
> server
>
> * put nvidia.pyconf in /etc/ganglia/conf.d/ on the node only (and
> verified gmond.conf includes that directory)
>
> * put related .php files in /usr/share/ganglia/ on server only
>
> * restarted everything everywhere
>
> What did I do wrong?  Logs are not saying anything useful.  The graphs
> just don't show up.  No error, nothing, just doesn't work.
>
> --
> Jeff White
> HPC Systems Engineer
> Information Technology Services - WSU
>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general



-- 
Jesse Becker

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to