Hello All, This is my current environment: 3 clusters each running on AIX 5.3 ML 6. 1 cluster made up of 5 nodes all running gmond (ver. 3.0.5) and 1 node designated the master running gmetad (ver. 3.0.5) 1 cluster made up of 4 nodes all running gmond (ver. 3.0.5) and 1 node designated the master running gmetad (ver. 3.0.5) 1 cluster made up of 2 nodes all running gmond (ver. 3.0.5) A Solaris 10 web server running the gmetad daemon (ver 3.0.5). The PROBLEM: The gmetad daemon on the web server will periodically hang and prevents any new updates to the rrd databse. The only way around is to stop apache and kill (yes kill) the gmetad process, then restart. It will run fine for awhile then the hang occurs again. The RESEARCH: I have examined the apache access and error logs and they are clean. I then reviewed the nohup startup file for gmetad with logging verbosity turned to 10. There are no errors appearing in this logfile. I then did a telnet to the gmond port of each client and successfully received the xml data. I then decided to perform a truss on the gmetad pid and received the following info: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> # truss -p 5944 /10: Stopped by signal #24, SIGTSTP, in nanosleep() /6: Stopped by signal #24, SIGTSTP, in lwp_park() /2: Stopped by signal #24, SIGTSTP, in accept() /7: Stopped by signal #24, SIGTSTP, in lwp_park() /4: Stopped by signal #24, SIGTSTP, in accept() /11: Stopped by signal #24, SIGTSTP, in nanosleep() /8: Stopped by signal #24, SIGTSTP, in nanosleep() /1: Stopped by signal #24, SIGTSTP, in nanosleep() /3: Stopped by signal #24, SIGTSTP, in lwp_park() /9: Stopped by signal #24, SIGTSTP, in nanosleep() /5: Stopped by signal #24, SIGTSTP, in lwp_park() It just seems to go to a sleep state with no warning or info. I have trouble shooted problems successfully in the past before in my ganglia configuration (ip changes, dir/file permissions...etc) but this one kinda got me scratching my head. Is this there a known issue with gmetad hanging during the polling process with this application? I can't afford to have production performance data lost like this. Is there anybody who can help? Thanks, -Leo
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

