Hi Bernard,
I have never used dtrace before and unfortunately the man page does not give
any examples. Could you tell me how you would like it run?
dtrace [-32 | -64] [-aACeFGHhlqSvVwZ] [-b bufsz] [-c cmd]
[-D name [=value]] [-I path] [-L path] [-o output] [-s
script] [-U name] [-x arg [=val]] [-X a | c | s | t] [-p
pid] [-P provider [[predicate] action]] [-m [provider:]
module [[predicate] action]] [-f [[provider:] module:] func-
tion [[predicate] action]] [-n [[[provider:] module:] func-
tion:] name [[predicate] action]] [-i probe-id [[predicate]
action]]
Thanks,
-Leo
________________________________
From: Albee, Leo
Sent: Tue 2/26/2008 3:06 PM
To: [email protected]
Subject: GMETAD Hanging
Hello All,
This is my current environment:
3 clusters each running on AIX 5.3 ML 6.
1 cluster made up of 5 nodes all running gmond (ver. 3.0.5) and 1 node
designated the master running gmetad (ver. 3.0.5)
1 cluster made up of 4 nodes all running gmond (ver. 3.0.5) and 1 node
designated the master running gmetad (ver. 3.0.5)
1 cluster made up of 2 nodes all running gmond (ver. 3.0.5)
A Solaris 10 web server running the gmetad daemon (ver 3.0.5).
The PROBLEM:
The gmetad daemon on the web server will periodically hang and prevents any
new updates to the rrd databse. The only way around is to stop apache and kill
(yes kill) the gmetad process, then restart. It will run fine for awhile then
the hang occurs again.
The RESEARCH:
I have examined the apache access and error logs and they are clean. I then
reviewed the nohup startup file for gmetad with logging verbosity turned to
10. There are no errors appearing in this logfile. I then did a telnet to the
gmond port of each client and successfully received the xml data. I then
decided to perform a truss on the gmetad pid and received the following info:
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> # truss -p 5944
/10: Stopped by signal #24, SIGTSTP, in nanosleep()
/6: Stopped by signal #24, SIGTSTP, in lwp_park()
/2: Stopped by signal #24, SIGTSTP, in accept()
/7: Stopped by signal #24, SIGTSTP, in lwp_park()
/4: Stopped by signal #24, SIGTSTP, in accept()
/11: Stopped by signal #24, SIGTSTP, in nanosleep()
/8: Stopped by signal #24, SIGTSTP, in nanosleep()
/1: Stopped by signal #24, SIGTSTP, in nanosleep()
/3: Stopped by signal #24, SIGTSTP, in lwp_park()
/9: Stopped by signal #24, SIGTSTP, in nanosleep()
/5: Stopped by signal #24, SIGTSTP, in lwp_park()
It just seems to go to a sleep state with no warning or info. I have trouble
shooted problems successfully in the past before in my ganglia configuration
(ip changes, dir/file permissions...etc) but this one kinda got me scratching
my head. Is this there a known issue with gmetad hanging during the polling
process with this application? I can't afford to have production performance
data lost like this. Is there anybody who can help?
Thanks,
-Leo
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general