Hi,

On Tue, Jan 15, 2008 at 09:12:39PM -0800, Peter Mueller wrote:
> Digging into this issue I see it is fixed for some months :
> Bug http://developerbugs.linux-foundation.org//show_bug.cgi?id=1697
> Fix (2007/sep/18): http://hg.linux-ha.org/dev/rev/d7e41b482c62

Yes, that's the fix and it may be independently applied.

Thanks,

Dejan

> I know these are not official, proper, etc. etc., however this
> is a major issue for us and I needed it fixed while staying in
> the existing packaging system.  To that end, I have applied the
> 1 line fix into the Centos extras src.rpm, and bumped it one
> minor revision.  For me at least, the src.rpm with the patch
> applied compiled and so far is working.  I suppose it wouldn't
> be right to recommend it to anyone, but if anyone is intersted
> I've placed my RPMS @ http://world.anarchy.com/~peter/ha/.  [
> RPMs are for Centos4 x86_64.  If this isn't you and even if it
> is you may want to consider rebuilding the src.rpm ].

> 
> Regards,
> P
> 
> > -----Original Message-----
> > From: Peter Mueller
> > Sent: Tuesday, January 15, 2008 5:28 PM
> > To: '[EMAIL PROTECTED]'; [email protected]
> > Subject: RE: [Linux-HA] ever increasing cpu usage
> > 
> > > I am seeing that my CPU usage is ever increasing, restarting the
> > > various HA services drops it down to near 0 again but then it comes
> > > back up again with time.
> > >
> > > Graph of CPU usage:
> > > http://193.201.200.132/~rip/linuxha-cpu.png
> > >
> > > Investigating this I found that the offending process is 
> > > /usr/lib/heartbeat/lrmd
> > >
> > > My setup:
> > >
> > > CentOS 5.1
> > > Heartbeat 2.1 from centos extras
> > >
> > > Has anyone seen this behavior before and can perhaps shed some light?
> > 
> > I am experiencing the same behavior on one cluster:
> > http://world.anarchy.com/~peter/ha/cpu_increase.png
> > CentOS release 4.5 (Final)
> > Linux oakdb04 2.6.9-55.ELlargesmp
> > heartbeat-stonith-2.1.2-3.el4.centos
> > heartbeat-pils-2.1.2-3.el4.centos
> > heartbeat-2.1.2-3.el4.centos
> > 
> > top - 17:25:29 up 81 days,  4:34,  1 user,  load average: 0.24, 0.22, 0.18
> > Tasks:  97 total,   2 running,  95 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  3.9% us,  0.4% sy,  0.0% ni, 94.4% id,  1.2% wa,  0.0% hi,  0.0% si
> > Mem:   8163852k total,  8142292k used,    21560k free,    88432k buffers
> > Swap:  8193140k total,      208k used,  8192932k free,  6467864k cached
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 10040 root      16   0  244m 216m 1300 R   26  2.7   5696:05 lrmd
> > 10362 mysql     15   0 7321m 1.0g 5340 S    6 13.3  12773:48 mysqld
> > 
> > A few seconds of strace on lrmd:
> > [EMAIL PROTECTED] ~]# strace -p 10040 > foo
> > Process 10040 attached - interrupt to quit
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874110
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874116
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874123
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874123
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874123
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874123
> > poll([{fd=4, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}, {fd=6,
> > events=POLLIN|POLLPRI}, {fd=7, events=POLLIN|POLLPRI}, {fd=9,
> > events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}], 6, 1000) = 0
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > recvfrom(6, 0x51f533, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=6, events=0}], 1, 0)          = 0
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > recvfrom(7, 0x522603, 3973, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=7, events=0}], 1, 0)          = 0
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > recvfrom(8, 0x524a09, 3343, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=8, events=0}], 1, 0)          = 0
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > recvfrom(9, 0x527144, 3972, 64, 0, 0)   = -1 EAGAIN (Resource temporarily
> > unavailable)
> > poll([{fd=9, events=0}], 1, 0)          = 0
> > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402,
> > tms_cstime=1264036}) = 1130874223
> > Process 10040 detached
> > 
> > Regards,
> > P
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to