Hi Leo,
you might be running into a problem with RRDTool here. We ran into a
similar problem that RRDTool processes would not end correctly and were
still running (forever).
The bug seems to be present since v1.2.19/20 and fixed with v1.2.26.
Maybe you could try out v1.2.26 of RRDTool, hopefully that helps.
Here are some pointers to the error:
https://lists.oetiker.ch/pipermail/rrd-developers/2007-November/002106.html
https://lists.oetiker.ch/pipermail/rrd-developers/2007-November/002109.html
Regards,
Michael
Albee, Leo wrote:
Hi Bernard,
The server has been rebooted as requested . I have already tried stopping the gmetad server and killing off the rrdtool processes and then restarting the gmetad server. The processes will start reaccumulating over a period of time. This server is not having any hardware issues according to it's logfiles. The version of rrdtool I am runningis: 1.2.19 and the ganglia build is: 3.0.4.
Leo Albee
Systems Manager
Children's Hospital - Boston
Phone Number: 857-218-4131
Email: [EMAIL PROTECTED]
________________________________
From: Bernard Li [mailto:[EMAIL PROTECTED]
Sent: Mon 12/10/2007 1:05 PM
To: Albee, Leo
Cc: [email protected]
Subject: Re: [Ganglia-general] Ganglia rrdtool problem?
Hi Leo:
On 12/10/07, Albee, Leo <[EMAIL PROTECTED]> wrote:
I have ganglia (3.0.4) setup in the following configuration:
3 clusters each with a head cluster node
* cluster1 = 4 nodes
* cluster2 = 6 nodes
* cluster3 = 2 nodes
1 ganglia server/Web
The configuration works fine and everything is working correctly. The problem I have is
the ganglia server seems to be cpu bound by rrdtool processes. When I check the server
each day it seems that new rrdtool processes have been added to the existing long running
processes. I've searched high and low and don't have anything to go on. Please see
"ps" output from the ganglia server, take notice of the long running rrdtool
processes. It seems excessive for there to be that many processes for 12 nodes. One
other thing to make note of is when I initially started the server there were only 6
rrdtool processes and they were taking pratically all the cpu cycles. Any help would be
appreciated.
nobody 22219 21854 0 Nov 09 ? 0:00 sh -c /usr/bin/rrdtool
graph - --start 1194644945 --end 1194648545 --width 300
nobody 4954 25991 0 09:32:04 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194532305 --end 1195137105 --width 300
nobody 24563 23152 0 12:53:21 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195059194 --end 1195062794 --width 300
nobody 1092 22274 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194721069 --end 1194724669 --width 300
nobody 24565 24563 4 12:53:21 ? 122:18 /usr/bin/rrdtool graph -
--start 1195059194 --end 1195062794 --width 300 --heig
nobody 22071 22069 4 Nov 09 ? 1895:27 /usr/bin/rrdtool graph -
--start 1194644923 --end 1194648523 --width 300 --heig
nobody 25498 1252 0 14:43:20 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1192650175 --end 1195069344 --width 300
nobody 1588 1203 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194639101 --end 1194725501 --width 300
nobody 1667 1473 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194639130 --end 1194725530 --width 300
nobody 22145 21855 0 Nov 09 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194644945 --end 1194648545 --width 300
nobody 1169 931 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194721091 --end 1194724691 --width 300
nobody 25421 24735 0 14:43:09 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1192650175 --end 1195069344 --width 300
nobody 930 929 5 Nov 10 ? 1071:04 /usr/bin/rrdtool graph -
--start 1194638269 --end 1194724669 --width 300 --heig
nobody 1247 1246 4 Nov 10 ? 1149:24 /usr/bin/rrdtool graph -
--start 1194119913 --end 1194724713 --width 300 --heig
nobody 1589 1588 4 Nov 10 ? 1065:28 /usr/bin/rrdtool graph -
--start 1194639101 --end 1194725501 --width 300 --heig
nobody 1668 1667 4 Nov 10 ? 1061:06 /usr/bin/rrdtool graph -
--start 1194639130 --end 1194725530 --width 300 --heig
nobody 24797 24796 4 13:44:36 ? 117:50 /usr/bin/rrdtool graph -
--start 1195062256 --end 1195065856 --width 300 --heig
nobody 1170 1169 4 Nov 10 ? 1075:51 /usr/bin/rrdtool graph -
--start 1194721091 --end 1194724691 --width 300 --heig
nobody 22221 22219 5 Nov 09 ? 1894:45 /usr/bin/rrdtool graph -
--start 1194644945 --end 1194648545 --width 300 --heig
nobody 1246 22163 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194119913 --end 1194724713 --width 300
nobody 1858 1616 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194721979 --end 1194725579 --width 300
nobody 24796 23091 0 13:44:36 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195062256 --end 1195065856 --width 300
nobody 22146 22145 4 Nov 09 ? 1912:40 /usr/bin/rrdtool graph -
--start 1194644945 --end 1194648545 --width 300 --heig
nobody 25422 25421 4 14:43:09 ? 107:17 /usr/bin/rrdtool graph -
--start 1192650175 --end 1195069344 --width 300 --heig
nobody 1453 21857 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194120678 --end 1194725478 --width 300
nobody 4711 3085 0 09:30:43 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195133412 --end 1195137012 --width 300
nobody 22069 21877 0 Nov 09 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194644923 --end 1194648523 --width 300
nobody 4876 4875 4 09:31:11 ? 0:13 /usr/bin/rrdtool graph -
--start 1194532261 --end 1195137061 --width 300 --heig
nobody 25875 25874 4 15:12:23 ? 111:01 /usr/bin/rrdtool graph -
--start 1195067510 --end 1195071110 --width 300 --heig
nobody 24641 1683 0 12:53:55 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195059217 --end 1195062817 --width 300
nobody 24642 24641 4 12:53:55 ? 120:06 /usr/bin/rrdtool graph -
--start 1195059217 --end 1195062817 --width 300 --heig
nobody 4712 4711 4 09:30:43 ? 0:15 /usr/bin/rrdtool graph -
--start 1195133412 --end 1195137012 --width 300 --heig
nobody 1454 1453 4 Nov 10 ? 1105:56 /usr/bin/rrdtool graph -
--start 1194120678 --end 1194725478 --width 300 --heig
nobody 4788 4786 4 09:30:55 ? 0:14 /usr/bin/rrdtool graph -
--start 1195133440 --end 1195137040 --width 300 --heig
nobody 23067 23066 4 Nov 13 ? 240:42 /usr/bin/rrdtool graph -
--start 1195009526 --end 1195013126 --width 300 --heig
nobody 929 21856 0 Nov 10 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194638269 --end 1194724669 --width 300
nobody 4875 21858 0 09:31:11 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1194532261 --end 1195137061 --width 300
nobody 1093 1092 4 Nov 10 ? 1138:34 /usr/bin/rrdtool graph -
--start 1194721069 --end 1194724669 --width 300 --heig
nobody 26058 24815 0 16:17:29 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195071419 --end 1195075019 --width 300
nobody 4786 1130 0 09:30:55 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195133440 --end 1195137040 --width 300
nobody 26060 26058 4 16:17:29 ? 102:33 /usr/bin/rrdtool graph -
--start 1195071419 --end 1195075019 --width 300 --heig
nobody 23066 1870 0 Nov 13 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195009526 --end 1195013126 --width 300
nobody 25874 24576 0 15:12:23 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195067510 --end 1195071110 --width 300
nobody 23145 23144 4 Nov 13 ? 250:06 /usr/bin/rrdtool graph -
--start 1195009552 --end 1195013152 --width 300 --heig
nobody 23144 21875 0 Nov 13 ? 0:00 sh -c /usr/bin/rrdtool graph
- --start 1195009552 --end 1195013152 --width 300
nobody 4955 4954 4 09:32:04 ? 0:09 /usr/bin/rrdtool graph -
--start 1194532305 --end 1195137105 --width 300 --heig
nobody 1859 1858 4 Nov 10 ? 1117:21 /usr/bin/rrdtool graph -
--start 1194721979 --end 1194725579 --width 300 --heig
nobody 25499 25498 4 14:43:20 ? 108:40 /usr/bin/rrdtool graph -
--start 1192650175 --end 1195069344 --width 300 --heig
This is way off from normal behaviour. I suggest you stop the gmetad
daemon, kill off all your rrdtool processes and start it up again
(wouldn't hurt to reboot the machine if possible too...). Have you
also checked to see if you are having HD problems?
Which version of rrdtool are you using and what Linux
distribution/version/arch are you running gmetad/web frontend?
Cheers,
Bernard
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general