Re: [Ganglia-general] Monitoring CTX switches and memory fragmentation
Hi Vladimir, is the CTX stuff already in a released version? I may need to tell the end customer to upgrade. Cheers Martin On Tue, May 5, 2015 at 4:12 PM, Vladimir Vuksan vli...@veus.hr wrote: I have wrote one for memory fragmentation. You can find it here https://github.com/ganglia/gmond_python_modules/tree/master/system/mem_fragmentation Context stuff is now in the monitor-core master https://github.com/ganglia/monitor-core/blob/master/gmond/python_modules/cpu/cpu_stats.py Vladimir On 05/05/2015 02:49 AM, Martin Knoblauch wrote: Hi friends, short question: does Ganglia provide monitor agents for context switches and memory fragmentation (e.g. listing contents of /proc/buddyinfo)? I want to avoid double work, should they exist officially? Cheers Martin -- -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Ganglia-general mailing listGanglia-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/ganglia-general -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring CTX switches and memory fragmentation
I have wrote one for memory fragmentation. You can find it here https://github.com/ganglia/gmond_python_modules/tree/master/system/mem_fragmentation Context stuff is now in the monitor-core master https://github.com/ganglia/monitor-core/blob/master/gmond/python_modules/cpu/cpu_stats.py Vladimir On 05/05/2015 02:49 AM, Martin Knoblauch wrote: Hi friends, short question: does Ganglia provide monitor agents for context switches and "memory fragmentation" (e.g. listing contents of /proc/buddyinfo)? I want to avoid double work, should they exist officially? Cheers Martin -- -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring CTX switches and memory fragmentation
Indeed it's in 3.7.1 Vladimir On 05/05/2015 11:24 AM, Martin Knoblauch wrote: is the CTX stuff already in a released version? I may need to tell the end customer to upgrade. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring IBM LSF Platform and GPFS
Hi Waleed I recently wrote a python LSF module for my last contract. It reported metrics on the jobs submitted to LSF as opposed to monitoring LSF itself (sbatchd,lim,res etc). Is this what you want? If so I could ask if the module could be made available Regards Paul On 11 December 2012 15:11, Waleed Harbi waleed.ha...@gmail.com wrote: Hello, I am looking for ganglia gmetric to monitoring IBM LSF Platform and GPFS. I hihgily appracited your advice if have any comment. I cannot find it under https://github.com/ganglia/gmetric. -- Best Wishes, Waleed Harbi Dream | Do | Be -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Tel: +44 79 8532 7353 LinkedIn: http://uk.linkedin.com/pub/paul-hewlett/0/629/9b4 -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring IBM LSF Platform and GPFS
Paul, That's really interesting, I appreciated your efforts if you can share it. -- Best Wishes, Waleed Harbi Dream | Do | Be On Wed, Dec 12, 2012 at 7:09 PM, Paul Hewlett phewlet...@gmail.com wrote: Hi Waleed I recently wrote a python LSF module for my last contract. It reported metrics on the jobs submitted to LSF as opposed to monitoring LSF itself (sbatchd,lim,res etc). Is this what you want? If so I could ask if the module could be made available Regards Paul On 11 December 2012 15:11, Waleed Harbi waleed.ha...@gmail.com wrote: Hello, I am looking for ganglia gmetric to monitoring IBM LSF Platform and GPFS. I hihgily appracited your advice if have any comment. I cannot find it under https://github.com/ganglia/gmetric. -- Best Wishes, Waleed Harbi Dream | Do | Be -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Tel: +44 79 8532 7353 LinkedIn: http://uk.linkedin.com/pub/paul-hewlett/0/629/9b4 -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring IBM LSF Platform and GPFS
What are you looking to monitor ? Queue sizes ? Vladimir On Tue, 11 Dec 2012, Waleed Harbi wrote: Hello,I am looking for ganglia gmetric to monitoring IBM LSF Platform and GPFS. I hihgily appracited your advice if have any comment. I cannot find it under https://github.com/ganglia/gmetric. -- Best Wishes, Waleed Harbi Dream | Do | Be -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring IBM LSF Platform and GPFS
I am looking for performance tuning for GPFS and LSF hosts, even if there are more functionality available that will be great. Both of them they are big product but I am looking for performance functions. -- Best Wishes, Waleed Harbi Dream | Do | Be On Tue, Dec 11, 2012 at 7:07 PM, Vladimir Vuksan vli...@veus.hr wrote: What are you looking to monitor ? Queue sizes ? Vladimir On Tue, 11 Dec 2012, Waleed Harbi wrote: Hello,I am looking for ganglia gmetric to monitoring IBM LSF Platform and GPFS. I hihgily appracited your advice if have any comment. I cannot find it under https://github.com/**ganglia/gmetric https://github.com/ganglia/gmetric . -- Best Wishes, Waleed Harbi Dream | Do | Be -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring processes
-- Message: 5 Date: Wed, 25 Jul 2012 11:30:27 -0500 From: Douglas Wagner dougla...@gmail.com Subject: Re: [Ganglia-general] Modifying ganglia. To: ganglia-general@lists.sourceforge.net Message-ID: CA+4avpuyyvb93OuVMZn1oM-vOKKkxCTW-F4awoN35=-m7nk...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 On Wed, Jul 25, 2012 at 2:22 AM, karthik karthikraj.palanik...@gmail.comwrote: Hi, I have built an application. I need to make ganglia monitor that application. .Can someone help me how to modify the ganglia. What are the steps involved in it? Thanks. Hi Karthik There is a Ganglia plugin 'procstat' which comes as standard with Ganglia. Currently it only measures cpu usage and memory usage for itself (gmond) and for other tasks which you can define via a regex. There are examples for apache etc. I am using it to monitor LSF. It should not be difficult to extend it to measure other process parameters Here is the conf: # modules { module { name = 'procstat' language = 'python' param gmond { value = '/gmond/' } param res { value = '/res/' } param sbatchd { value = '/sbatchd/' } param lim { value = '/lim/' } param pim { value = '/pim/' } param melim { value = '/melim/' } /* param mbatchd { value = '/mbatchd/' } param mbschd { value = '/mbschd/' } param bld { value = '/bld/' } */ /* param httpd { value = '/var/run/httpd.pid' } param mysqld { value = '/\/usr\/libexec\/mysqld/' } param splunk { value = '/splunkd.*start/' } param splunk-web { value = '/twistd.*SplunkWeb/' } */ } } collection_group { collect_every = 1 time_threshold = 30 metric { name_match = procstat_(.+)_cpu } metric { name_match = procstat_(.+)_mem } } Regards -- Paul Hewlett X25250 http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/ ARM Ltd 110 Fulbourn Road, Cambridge, CB1 9NJ Tel: +44 (0)1223 405923 skype: paul-at-arm www.arm.com -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
It sounds like you don't have any jobs actually running. Are you sure SGE is running? On Mon, May 2, 2011 at 23:09, Mostafa Ismail mostafa.ism...@itworx.com wrote: Same result! i got no output, what does it mean? Thanks, Mostafa Ismail From: Jesse Becker [haw...@gmail.com] Sent: Monday, May 02, 2011 3:02 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net; Bernard Li Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia Try running this: qstat -u '*' Yes, you need the quotes. On Mon, May 2, 2011 at 03:43, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Actually I found that the qstat doesn't return any output ... except when using -f it display all queues. [root@sge01 tmp]# qstat [root@sge01 tmp]# qstat -f queuename qtype resv/used/tot. load_avg arch states - all.q@pexe1 BIP 0/16/16 1.57 lx24-amd64 - all.q@pexe2 BIP 0/16/16 1.76 lx24-amd64 - all.q@pexe3 BIP 0/16/16 1.62 lx24-amd64 ~ ~ ~ [root@sge01 tmp]# What does it mean? Thanks, Mostafa Ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 7:17 PM To: Bernard Li Cc: Mostafa Ismail; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia Yeah, pretty close to the same file. I'll post update both the collector and php file later on. On Tue, Apr 19, 2011 at 13:10, Bernard Li bern...@vanhpc.org wrote: Hi Jesse: Is the script you have included the same as the one found here? https://github.com/ganglia/gmetric/tree/master/hpc/sge_jobs I didn't see the PHP file in the repo though. Perhaps you can update what's available in the repo and point users there in the future. Thanks! Bernard On Tue, Apr 19, 2011 at 7:00 AM, Jesse Becker haw...@gmail.com wrote: The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Jesse Becker -- Jesse Becker -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Hello Jesse, Actually I found that the qstat doesn't return any output ... except when using -f it display all queues. [root@sge01 tmp]# qstat [root@sge01 tmp]# qstat -f queuename qtype resv/used/tot. load_avg arch states - all.q@pexe1 BIP 0/16/161.57 lx24-amd64 - all.q@pexe2 BIP 0/16/161.76 lx24-amd64 - all.q@pexe3 BIP 0/16/161.62 lx24-amd64 ~ ~ ~ [root@sge01 tmp]# What does it mean? Thanks, Mostafa Ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 7:17 PM To: Bernard Li Cc: Mostafa Ismail; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia Yeah, pretty close to the same file. I'll post update both the collector and php file later on. On Tue, Apr 19, 2011 at 13:10, Bernard Li bern...@vanhpc.org wrote: Hi Jesse: Is the script you have included the same as the one found here? https://github.com/ganglia/gmetric/tree/master/hpc/sge_jobs I didn't see the PHP file in the repo though. Perhaps you can update what's available in the repo and point users there in the future. Thanks! Bernard On Tue, Apr 19, 2011 at 7:00 AM, Jesse Becker haw...@gmail.com wrote: The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Try running this: qstat -u '*' Yes, you need the quotes. On Mon, May 2, 2011 at 03:43, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Actually I found that the qstat doesn't return any output ... except when using -f it display all queues. [root@sge01 tmp]# qstat [root@sge01 tmp]# qstat -f queuename qtype resv/used/tot. load_avg arch states - all.q@pexe1 BIP 0/16/16 1.57 lx24-amd64 - all.q@pexe2 BIP 0/16/16 1.76 lx24-amd64 - all.q@pexe3 BIP 0/16/16 1.62 lx24-amd64 ~ ~ ~ [root@sge01 tmp]# What does it mean? Thanks, Mostafa Ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 7:17 PM To: Bernard Li Cc: Mostafa Ismail; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia Yeah, pretty close to the same file. I'll post update both the collector and php file later on. On Tue, Apr 19, 2011 at 13:10, Bernard Li bern...@vanhpc.org wrote: Hi Jesse: Is the script you have included the same as the one found here? https://github.com/ganglia/gmetric/tree/master/hpc/sge_jobs I didn't see the PHP file in the repo though. Perhaps you can update what's available in the repo and point users there in the future. Thanks! Bernard On Tue, Apr 19, 2011 at 7:00 AM, Jesse Becker haw...@gmail.com wrote: The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Jesse Becker -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Same result! i got no output, what does it mean? Thanks, Mostafa Ismail From: Jesse Becker [haw...@gmail.com] Sent: Monday, May 02, 2011 3:02 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net; Bernard Li Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia Try running this: qstat -u '*' Yes, you need the quotes. On Mon, May 2, 2011 at 03:43, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Actually I found that the qstat doesn't return any output ... except when using -f it display all queues. [root@sge01 tmp]# qstat [root@sge01 tmp]# qstat -f queuename qtype resv/used/tot. load_avg arch states - all.q@pexe1 BIP 0/16/161.57 lx24-amd64 - all.q@pexe2 BIP 0/16/161.76 lx24-amd64 - all.q@pexe3 BIP 0/16/161.62 lx24-amd64 ~ ~ ~ [root@sge01 tmp]# What does it mean? Thanks, Mostafa Ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 7:17 PM To: Bernard Li Cc: Mostafa Ismail; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia Yeah, pretty close to the same file. I'll post update both the collector and php file later on. On Tue, Apr 19, 2011 at 13:10, Bernard Li bern...@vanhpc.org wrote: Hi Jesse: Is the script you have included the same as the one found here? https://github.com/ganglia/gmetric/tree/master/hpc/sge_jobs I didn't see the PHP file in the repo though. Perhaps you can update what's available in the repo and point users there in the future. Thanks! Bernard On Tue, Apr 19, 2011 at 7:00 AM, Jesse Becker haw...@gmail.com wrote: The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Jesse Becker -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at “Ganglia-general” forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker ganglia_sge Description: Binary data attachment: jobqueue_report.php -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Thanks a lot! Jesse. Everything now is implemented except for editing conf.php actually I don't know what should I do, and the script ran, but when accessing the php page jobqueue_report.php I got nothing! I am afraid bothering you, but it will be pretty fine if there's any documentation which can I follow, then get back if I have issues Thanks, Mostafa Ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 4:01 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Hi Jesse: Is the script you have included the same as the one found here? https://github.com/ganglia/gmetric/tree/master/hpc/sge_jobs I didn't see the PHP file in the repo though. Perhaps you can update what's available in the repo and point users there in the future. Thanks! Bernard On Tue, Apr 19, 2011 at 7:00 AM, Jesse Becker haw...@gmail.com wrote: The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring SGE queues using Ganglia
Yeah, pretty close to the same file. I'll post update both the collector and php file later on. On Tue, Apr 19, 2011 at 13:10, Bernard Li bern...@vanhpc.org wrote: Hi Jesse: Is the script you have included the same as the one found here? https://github.com/ganglia/gmetric/tree/master/hpc/sge_jobs I didn't see the PHP file in the repo though. Perhaps you can update what's available in the repo and point users there in the future. Thanks! Bernard On Tue, Apr 19, 2011 at 7:00 AM, Jesse Becker haw...@gmail.com wrote: The sge.sh is something that you should have already that sets the various ENV variables needed to run SGE programs. Specifically, it must set at least SGE_ROOT, SGE_CELL and SGE_ARCH, and probably update your PATH correctly as well (so the script can find the 'qstat' binary). The PHP script goes in wwwroot/ganglia/graphs.d/, and you will need to edit the conf.php file to include jobqueue in the $optional_graphs list. On Tue, Apr 19, 2011 at 09:56, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello Jesse, Thanks for fast reply. I have some questions here: - I can found the /etc/profile.d/sge.sh script the you're sourcing at the ganglia_sge. - Also where can I add the php script. Thanks, Mostafa ismail -Original Message- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Tuesday, April 19, 2011 3:39 PM To: Mostafa Ismail Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Monitoring SGE queues using Ganglia On Tue, Apr 19, 2011 at 09:25, Mostafa Ismail mostafa.ism...@itworx.com wrote: Hello, Is it possible to monitor the SGE queues (such as all.q) using ganglia? I did search at Ganglia-general forum and I found no match. Yes, it is possible. You need to do two things: 1) collect the metrics from SGE. 2) graph the metrics. Attached is a script (ganglia_sge) that I run from cron every few minutes to collect metrics, and a custom report that can display them. -- Jesse Becker -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring
Ok. I just ran a 'gstat --all' And only one host comes up, just the localhost. So there is something missing. any ideas? -John On Nov 17, 2009, at 9:22 AM, John Martyniak wrote: Hi everyone, Ok I got my Ganglia monitor up and working, and it was pulling results from the localhost. So I enable the hadoop-metrics.properties and made the appropriate changes so that it pointed at me ganglia box. I made a data_source in the gmetad.conf file, and attached the two test nodes to it. I restart gmond, gemtad and the ganglia-web for good measure. But I am not seeing any results, and I am not seeing my data source, it says unspecified. Any ideas? Thanks in advance. -John -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring
try this command #gstat --all -i a_hostname_in_cluster Chifeng On Tue, Nov 17, 2009 at 11:02 PM, John Martyniak j...@beforedawnsolutions.com wrote: Ok. I just ran a 'gstat --all' And only one host comes up, just the localhost. So there is something missing. any ideas? -John On Nov 17, 2009, at 9:22 AM, John Martyniak wrote: Hi everyone, Ok I got my Ganglia monitor up and working, and it was pulling results from the localhost. So I enable the hadoop-metrics.properties and made the appropriate changes so that it pointed at me ganglia box. I made a data_source in the gmetad.conf file, and attached the two test nodes to it. I restart gmond, gemtad and the ganglia-web for good measure. But I am not seeing any results, and I am not seeing my data source, it says unspecified. Any ideas? Thanks in advance. -John -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- regards. chifeng -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring NFS share disk usage
On Tue, Oct 28, 2008 at 05:30:25PM +1100, Adam Mitchell wrote: #!/bin/bash VALUE=$(df /home/ | grep /home |awk '{print $3 }') gmetric --name disk_nfs_used --value $VALUE --type uint32 --units Bytes not relevant for your problem but units here should be KB gmond is running on the head node. However, there doesn't seem to be any rrd's being produced. the rrd are generated by gmetad, which in turns reads them from your gmond. for that to work on your setup you need to configure (/etc/gmond.conf) the gmond in your head node with the same cluster name (shiva) and collector than your work nodes, and then confirm that your gmetad is configured (/etc/gmetad.conf) to pull the status for your cluster (shiva) including your new metric. telnet to port 8649 in your collector (any gmond if using multicast, or the one that your gmetad is pointing to if using unicast) should dump an XML description of your cluster and include the new metric you just created with gmetric inside a host definition from your head node. I have added the flowing lines to the cluster_view.tpl IMG HEIGHT=147 WIDTH=395 ALT={cluster} DISK SRC=./graph.php?c=shivaamp;h=shiva.edag.clusteramp;v=233.904amp;m=disk_nfs_usedamp;r=houramp;z=mediumamp;jr=amp;js=amp;st=1225161877vl=GB /TD v, st, z, r are better pulled from the environment as you will be otherwise hardcoding some of the values for your graph. since you are trying to import a metric graph in a cluster view, that might not work correctly anyway and so changes to graph.php might be needed too. Carlo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring Linux Multipathed Devices
Craig Simpson wrote: Does anyone have a method for monitoring Linux Multipathed Devices, created by multipthd and dm? Use udev to create /dev/ names that match your multipath names. On Rhat, a rule in /etc/udev/rules.d and a script in /etc/udev/scripts should be sufficient. http://www.redhat.com/magazine/002dec04/features/udev/ - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring Linux Multipathed Devices
Tried mapping asm01 to a raw device, called /dev/raw/asm01, but that doesn't seem to be something I can run iostat against either. I think a real trick for clustered storage is to understand the IO to multipathed devices and graph over time. Trying to gather (and graph my IO multipath aliases (to also send to Ganglia)). Using the linux native multipathd for multipathing. It is working well. asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F [size=32G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][active] \_ 0:0:1:32 sdm 8:192 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:32 sdb 8:16 [active][ready] I am trying to figure HOW I can get iostat –x information for the aliases. An example is asm01 above. The dm that it points to can change (and does) between reboots. Tried mapping asm01 to a raw device, called /dev/raw/asm01, but that doesn't seem to be something I can run iostat against either. Has anyone figured a way todo this? Really IO stats are needed per alias. Thanks! Craig On Fri, Jul 11, 2008 at 8:41 AM, Ethan Erchinger [EMAIL PROTECTED] wrote: Craig Simpson wrote: Does anyone have a method for monitoring Linux Multipathed Devices, created by multipthd and dm? Use udev to create /dev/ names that match your multipath names. On Rhat, a rule in /etc/udev/rules.d and a script in /etc/udev/scripts should be sufficient. http://www.redhat.com/magazine/002dec04/features/udev/ -- Get Creative!!! @ http://3rdstone.net Use your BRAIN @ http://brainradar.com Get Wisdom @ http://www.youtube.com/profile_videos?user=drturistarp=r In the circle the beginning and the end are common ~ Heraclitis (540-480BC) - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] monitoring a HA cluster
Alex, oh dear, it looks like I answered the wrong question *again*. As I don't have test access to a running ganglia someone else should answer. But part of it may be to - - configure gmetad.conf to poll the failover VIP IP or DNS name, not the physical ones. - Configure each server in the failover pair to send UDP data to themselves. Don't use 127.0.0.1 and don't use the floating VIP. Use an IP address that stays with the server. - Now you have to arrange it so that the src address of the UDP being looped back maps to the same hostname string when doing a reverse lookup DNS on the gmond hosts. Note that the host names in the XML stream that goes to gmetad are generated by gmond doing a reverse DNS lookup on the src IPs of the UDP traffic. There are a few ways you may arrange this. kind regards, richard - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] monitoring a HA cluster
Alex, They are the only 2 members of the cluster? How about this: - The gmond.conf on host A is configured unicast and to send data to the *physical* address (not the VIP) of Host B. Do not configure gmond.conf to send data to itself. The only UDP send channel is to host B - Configure the Host B gmond.conf in the above way to send its UDP data from host B to host A, and not to send to itself. - Configure gmetad.conf to poll the floating VIP address of the cluster. Now I have to say I am away from my computers, so this is a thought experiment. But at my previous company we did this for a failover pair of servers that were the headnodes of a larger cluster. It worked OK I think. I also seem to remember that the cygwin gmond behaves a little bit differently to the Linux one in the case where you do/dont want metrics sent to the host itself. Let me know if this helps. :-) regards, Richard Quoting [EMAIL PROTECTED]: Second post today, separate topic... I've got a few machines set up as active/passive clusters running heartbeat/drbd. I am currently monitoring them with ganglia, but I think the information I'm getting leads to a misleading picture. Since both machines are monitored, it looks like I have 8 processors in the cluster (4 each in 2 boxes). But in reality, only 1 of these machines is ever available at 1 time. I am keeping a mental note to myself that any time these clusters are more than 50% utilized, they're really 100% utilized, since the CPUs, RAM, etc from the passive node really shouldn't count in the totals. Always having to drill down to the level of the individual machine to see what's going on is kind of a pain. The only solution I've thought of is to keep gmond turned off on the passive node, and starting it during a resource migration. This would be easy enough, but it would have 2 drawbacks : 1. My stats would say 50% of my cluster is 'down' although it's functioning correctly. 2. It is sometimes useful to monitor stuff on the passive node, and I don't really want to lose that ability. Any better ways to do this? Maybe extend the PHP frontend to be configurable for monitoring active/passive? (Would anyone else have a use for that besides me?) thanks, alex - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring one process
I prepared a simple scripts for doing this: I run these commands every minute via cron: /usr/bin/gmetric -n _my_process_memory -tint16 -u% -v `/usr/local/bin/scripts/memory.pl my_process` /usr/bin/gmetric -n _my_process_cpu -tint16 -u% -v `/usr/local/bin/scripts/cpu.pl my_process` Memory.pl: #!/usr/bin/perl $PROCNAME = @ARGV[0]; $proc_mem = 0; foreach $line (`ps -C$PROCNAME -o%mem`) { if ($line !~ '%MEM') { $proc_mem += $line; } } print $proc_mem; cpu.pl: #!/usr/bin/perl $PROCNAME = @ARGV[0]; $proc_cpu = 0; foreach $line (`ps -C$PROCNAME -o%cpu`) { if ($line !~ '%CPU') { $proc_cpu += $line; } } print $proc_cpu; Rgds, Vitaly -Original Message- From: [EMAIL PROTECTED] [mailto:ganglia- [EMAIL PROTECTED] On Behalf Of João Oliveira Sent: Friday, October 13, 2006 3:24 PM To: ganglia-general@lists.sourceforge.net Subject: [Ganglia-general] Monitoring one process Hi all, i was reading the documentation's FAQ when i read about metrics that Ganglia supports. Well, i read all of them trying to understand each but i couldn't find the one that interests me the most, monitoring processes individually. So, can i collect CPU usage time of one specific process using Ganglia?? I mean, is it possible? Thanks in advance, João Oliveira - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring one process
You may monitor whatever you like through the use of the gmetric command. João Oliveira wrote: Hi all, i was reading the documentation's FAQ when i read about metrics that Ganglia supports. Well, i read all of them trying to understand each but i couldn't find the one that interests me the most, monitoring processes individually. So, can i collect CPU usage time of one specific process using Ganglia?? I mean, is it possible? Thanks in advance, João Oliveira - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Monitoring one process
hi, I created this add-on. It allows you to collect metrics of one specific process using Ganglia. http://www-usr.inf.ufsm.br/~veiga/gappmon/ (Portuguese only) []'s -veiga On 10/13/06, João Oliveira [EMAIL PROTECTED] wrote: Hi all, i was reading the documentation's FAQ when i read about metrics that Ganglia supports. Well, i read all of them trying to understand each but i couldn't find the one that interests me the most, monitoring processes individually. So, can i collect CPU usage time of one specific process using Ganglia?? I mean, is it possible? Thanks in advance, João Oliveira - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Marcelo Veiga Neves http://www.inf.ufsm.br/~veiga/ Talk is cheap. Show me the code. (Linus Torvalds)
Re: [Ganglia-general] monitoring
Nagios? Cheers Martin --- Dirk Roessler [EMAIL PROTECTED] wrote: Does someone knows an easy to install and easy to use solution for monitoring and sending email notifications of down nodes and health state on a Linux HPC cluster? Dirk begin:vcard fn;quoted-printable:Dirk R=C3=B6=C3=9Fler n;quoted-printable:R=C3=B6=C3=9Fler;Dirk org:_University of Potsdam;Department of Geosciences adr:;;K.-Liebknecht-Str. 24/25;Golm/Potsdam;;14476;Germany email;internet:[EMAIL PROTECTED] title:Geophysicist tel;work:+49 331 977 5795 tel;fax:+49 331 977 5700 x-mozilla-html:FALSE url:http://www.geo.uni-potsdam.de/mitarbeiter/Roessler/roessler.html version:2.1 end:vcard - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-general] monitoring
Dirk Roessler wrote: Does someone knows an easy to install and easy to use solution for monitoring and sending email notifications of down nodes and health state on a Linux HPC cluster? You could use Nagios and Ganglia Python client. Basically you use the Ganglia Python client to get metric value then depending on its value you send an alert. Setting up Nagios may not be easy but it is definitely worth it long term. Vladimir
Re: [Ganglia-general] Monitoring
leif- i've been wanting to have a way to implement an active alerting mechanism for a while. the development team would love some help if you're willing to donate a little time. i have an idea for a quick and smart hack (i think). gmetad is already doing the hardest part of this work. here's the trick... on the machine running gmetad you'll find all the round-robin databases in /var/lib/ganglia/rrds (by default) in a nice hierarchy which can be used to query the information that gmetad has stored. the hierarchy looks like this... - root (most likely /var/lib/ganglia/rrds) | +-- __SummaryInfo__ || |+ Metric foo.rrd |+ Metric bar.rrd |... | +-- Cluster 1 || |+ __SummaryInfo__ || | || + Metric foo.rrd || + Metric bar.rrd || + || |+ Host a on Cluster 1 || | || + Metric foo.rrd || + Metric bar.rrd || ... || |+ Host b on Cluster 1 | | | + Metric foo.rrd | + Metric bar.rrd | ... +-- Cluster 2 ... etc etc etc the __SummaryInfo__ directory are your friend because they contain the summary information for each metric and each level (grid, cluster and host). if you just do a find . in /var/lib/ganglia/rrds you'll see what i mean. how do you get the data out of the round-robin databases? using rrdtool. here is a walk through on one of the Millennium monitoring machines # cd /var/lib/ganglia/rrds # ls CITRIS Pilot Cluster Millennium Cluster OceanStore WebServer Cluster __SummaryInfo__ # cd Citris\ Pilot\ Cluster/ # ls grapefruit.Millennium.berkeley.edu lime.Millennium.Berkeley.EDU lemon.Millennium.berkeley.edu orange.Millennium.berkeley.edu __SummaryInfo__ # cd orange.Millennium.berkeley.edu # ls bytes_in.rrd cpu_nice.rrddisk_free.rrd load_one.rrd mem_shared.rrd pkts_out.rrd bytes_out.rrd cpu_num.rrd disk_total.rrdmem_buffers.rrd mem_total.rrd proc_run.rrd cpu_aidle.rrd cpu_system.rrd load_fifteen.rrd mem_cached.rrd part_max_used.rrd proc_total.rrd cpu_idle.rrd cpu_user.rrdload_five.rrd mem_free.rrd pkts_in.rrdswap_free.rrd say we want to monitor a specific metric (say cpu_user) on this specific host (orange). to get the data all we have to do is use rrdtool # date '+now is %s' now is 1034101015 # rrdtool fetch ./cpu_user.rrd AVERAGE -s N-60 sum 1034100945: 0.00e+00 1034100960: 3.73e-01 1034100975: 7.00e-01 1034100990: 7.00e-01 1034101005: 7.00e-01 1034101020: nan the first command is just to let you see the timestamp of when i ran this. the rrdtool command is simple and gives you a nice table of recent values (N-60 means now minus 60 seconds so the data is over the last 60 seconds). the first column is the timestamp when the data was put into the database and the second column (after the ':' delimiter) is the value inserted. the Data Source (DS) name is sum which you see at the top. important note: the __SummaryInfo__ databases have 2 Data Sources sum and num. the num datasource is the number of hosts which were added together to get the sum. it allows you to easily get averages (just divide sum by num). let's open up a __SummaryInfo__ database now... # cd .. # pwd /var/lib/ganglia/rrds/Citris Pilot Cluster # ls grapefruit.Millennium.berkeley.edu lime.Millennium.Berkeley.EDU __SummaryInfo__ lemon.Millennium.berkeley.edu orange.Millennium.berkeley.edu # cd __SummaryInfo__ # date '+now is %s' now is 1034101477 # rrdtool fetch ./cpu_user.rrd AVERAGE -s N-60 sum num 1034101410: 1.30e+00 3.00e+00 1034101425: 9.00e-01 3.00e+00 1034101440: 1.00e-01 3.00e+00 1034101455: 2.50e+00 3.00e+00 1034101470: 2.50e+00 3.00e+00 1034101485: nan nan the commandline for getting the data is exactly the same but you get back a second column which (in this case) tells us that the value from three hosts where added together to get the value sum. make sense? the nice thing about using the round-robin databases is that you have a strick hierarchical directory structure which allows you to key in to specific things i want to monitor. it's also a good solution because gmetad has done all the summary work for you. you could write an alert system using simple scripting (bourne, perl, python, et al). i would like someone to step up to the plate on this. it a great feature that is just dying to be born (?). just let me know if you (plural .. meaning Leif or someone on the development team ..
Re: [Ganglia-general] Monitoring
Steven Wagner [EMAIL PROTECTED] writes: And, of course, the direction you're probably already going in - writing an app in Perl (or Python or Java or C or C++ or Pascal or Prolog or Pilot or COBOL or ... ) to connect to gmetad, parse the output, and then fire off a stream of passive updates to Nagios/Netsaint via nsca. Yes, that's what I did last week. It ain't no fun. Nagios' handling of passive service checks isn't flexible enough. And passive host checking Just Isn't Done. The ganglia philosophy so far has been to make things work with a minimum of tweaking. Having set up three different open source monitoring system over the last few years, it seems to me that it's nearly impossible to set up notifications without a *LOT* of tweaking. So there's two ways of doing this, I think: * We need config files. Lots of them. [WHOOSH!] (1 per node?) * Monitoring thresholds are hard-coded as part of each metric definition. Well, each metric could certainly come with default thresholds, and if you use some inheritance mechanism you could rather easily specify thresholds for all your cluster nodes: metacluster: warn if load 5 # default load threshold warn if last_heard_from 60 # default heartbeat threshold cluster foo: warn if load 10 # Twin cpu nodes in this cluster, so # double load threshold host odd_one: warn if load 5 # Except for this node That way, you only need to specify any exceptions from the defaults. Whooshy enough? What would seem to take some consideration is how to keep track of the metacluster state. You need state tracking, since you want flank detection so you trigger the klaxons only when a node goes down, and *not* every five minute during its downtime. And for most metrics you want some hysteresis mechanism so you don't get continuous notifications if a metric fluctuates around the threshold. -- Leif NixonSystems expert National Supercomputer Centre Linkoping University