I have slowly re-cofig'd aixdisk.conf into 3 prod machines for a total of 2154 metrics among 3 hosts and gmond did puke for a brief 15 seconds, then rrd was able to display properly again. This happened when I hit 'get fresh data', one time. I may reduce it down to 2 prod hosts totaling (2154-264) metrics, but has anyone started to research this gmond problem?
Thanks! Message: 3 Date: Wed, 18 Sep 2013 08:24:56 -0400 From: Derek Smith <[email protected]> Subject: Re: [Ganglia-general] gmond core dumping, again on head node. To: Derek Smith <[email protected]>, "[email protected]" <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii" It seems that the problem stems from the aixdisk.conf and its C code. I renamed aixdisk.conf and restarted gmond on all my hosts and gmond has stayed up for over 12 hours. If anyone needs the core file, let me know! Thx! From: Derek Smith Sent: Tuesday, September 17, 2013 02:07 PM To: [email protected] Subject: gmond core dumping, again on head node. Ever since my upgrade to 3.6 gmond is very shaky to say the least...gmond keeps seg faulting. I have the core file if needed! Any help much appreciated! Thank you! My ENV is: AIX 6100-08-03-1339 gmond 3.6.0 gmetad 3.6.0 web front-end "3.5.10"; Server version: Apache/2.4.3 (Unix) RRDtool 1.4.8 Copyright 1997-2013 by Tobias Oetiker <[email protected]<mailto:[email protected]>> gmond rrdcache: "/var/lib/ganglia/rrdcached/rrdcached.socket"; gmetad rrdcache: RRDCACHED_ADDRESS=/var/lib/ganglia/rrdcached/rrdcached.socket Error report details # cat php-errors.log [05-Sep-2013 13:59:26 America/Detroit] PHP Notice: Undefined index: hreg in /var/www/htdocs/ganglia3510/ganglia-web-3.5.10/graph_all_periods.php on line 84 [05-Sep-2013 14:05:06 America/Detroit] PHP Notice: Undefined index: hreg in /var/www/htdocs/ganglia3510/ganglia-web-3.5.10/graph_all_periods.php on line 843 CORE FILE NAME /var/adm/ras/corefiles/core.9371670.17154125 PROGRAM NAME gmond STACK EXECUTION DISABLED 0 COME FROM ADDRESS REGISTER rmgr_disa FFFFF9B4 PROCESSOR ID hw_fru_id: 0 hw_cpu_id: 4 ADDITIONAL INFORMATION extend_br 238 extend_br 1E8 Symptom Data REPORTABLE 1 INTERNAL ERROR 0 SYMPTOM CODE PCSS/SPI2 FLDS/gmond SIG/11 FLDS/extend_br VALU/238 FLDS/rmgr_disa Syslog details, core dump 1215-ish ESDT Sep 17 12:14:31 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:14:31 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:14:45 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:14:45 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:14:57 ganglia01ap daemon:info xntpd[4063412]: synchronized to 10.1.1.200, stratum=1 Sep 17 12:15:00 ganglia01ap daemon:notice ConfigRM[7340166]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.57,347 :::CONFIGRM_STARTED_S T IBM.ConfigRM daemon has started. Sep 17 12:15:00 ganglia01ap daemon:err|error ConfigRM[7340166]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: 6895a4e3:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.57,506 :::CONFIGRM_ERROR_ ER An internal error was encountered in the configuration manager daemon (IBM.ConfigRMd). Error Code 00018001 Message Catalo g Name ct_rmf.cat Message Set 1 Message Identifier 7 Message Inserts 00000005 Sep 17 12:15:00 ganglia01ap daemon:notice ConfigRM[7340168]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.57,347 :::CONFIGRM_STARTED_S T IBM.ConfigRM daemon has started. Sep 17 12:15:00 ganglia01ap daemon:err|error ConfigRM[7340168]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: 6895a4e3:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.57,506 :::CONFIGRM_ERROR_ ER An internal error was encountered in the configuration manager daemon (IBM.ConfigRMd). Error Code 00018001 Message Catalo g Name ct_rmf.cat Message Set 1 Message Identifier 7 Message Inserts 00000005 Sep 17 12:15:01 ganglia01ap daemon:notice ConfigRM[7340170]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.57,347 :::CONFIGRM_STARTED_S T IBM.ConfigRM daemon has started. Sep 17 12:15:01 ganglia01ap daemon:err|error ConfigRM[7340170]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: 6895a4e3:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.57,506 :::CONFIGRM_ERROR_ ER An internal error was encountered in the configuration manager daemon (IBM.ConfigRMd). Error Code 00018001 Message Catalo g Name ct_rmf.cat Message Set 1 Message Identifier 7 Message Inserts 00000005 Sep 17 12:15:01 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:15:01 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:15:16 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:15:16 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:15:29 ganglia01ap daemon:info xntpd[4063412]: synchronized to 10.1.1.201, stratum=1 Sep 17 12:15:31 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:15:31 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:15:46 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:15:46 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:16:01 ganglia01ap daemon:info xntpd[4063412]: synchronized to 10.1.1.200, stratum=1 Sep 17 12:16:01 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:16:01 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:16:08 ganglia01ap aso:notice aso[15073350]: [HIB] Used entitlement per unfolded vCPU is below threshold (13% of a core). Sep 17 12:16:08 ganglia01ap aso:notice aso[15073350]: [HIB] Cache optimizations will hibernate until used entitlement is at least 30% of a core per unfolded vCPU Sep 17 12:16:16 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 Sep 17 12:16:16 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() got no answer from any [IBMpower] da tasource Sep 17 12:16:31 ganglia01ap user:info /opt/freeware/sbin/gmetad[8192110]: data_thread() for [IBMpower] failed to contact nod e 10.255.9.12 -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. https://urldefense.proofpoint.com/v1/url?u=http://pubads.g.doubleclick.net/gampad/clk?id%3D58041151%26iu%3D/4140/ostg.clktrk&k=j2AJn6IkQ79ZgTSu1WDHyg%3D%3D%0A&r=r7kjoOqrPUEbvJC8fa50N7BUshlePBUb7tm6tw5oE5c%3D%0A&m=3PFDhfBbaZzmzGnhwdQ6JZzimeflixp%2BtIu0eHnGO84%3D%0A&s=21966c88ebba47f5c50c5ff8ccaee51c3b90a4bf8977656baed07e6b30b89594 ------------------------------ _______________________________________________ Ganglia-general mailing list [email protected] https://urldefense.proofpoint.com/v1/url?u=https://lists.sourceforge.net/lists/listinfo/ganglia-general&k=j2AJn6IkQ79ZgTSu1WDHyg%3D%3D%0A&r=r7kjoOqrPUEbvJC8fa50N7BUshlePBUb7tm6tw5oE5c%3D%0A&m=3PFDhfBbaZzmzGnhwdQ6JZzimeflixp%2BtIu0eHnGO84%3D%0A&s=3414d1996055c78694d2b331cfceaf763eb059ed10ebf1ac75ae47faef156b93 End of Ganglia-general Digest, Vol 88, Issue 8 ********************************************** ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

