Hi, On Tue, Jan 15, 2008 at 09:12:39PM -0800, Peter Mueller wrote: > Digging into this issue I see it is fixed for some months : > Bug http://developerbugs.linux-foundation.org//show_bug.cgi?id=1697 > Fix (2007/sep/18): http://hg.linux-ha.org/dev/rev/d7e41b482c62
Yes, that's the fix and it may be independently applied. Thanks, Dejan > I know these are not official, proper, etc. etc., however this > is a major issue for us and I needed it fixed while staying in > the existing packaging system. To that end, I have applied the > 1 line fix into the Centos extras src.rpm, and bumped it one > minor revision. For me at least, the src.rpm with the patch > applied compiled and so far is working. I suppose it wouldn't > be right to recommend it to anyone, but if anyone is intersted > I've placed my RPMS @ http://world.anarchy.com/~peter/ha/. [ > RPMs are for Centos4 x86_64. If this isn't you and even if it > is you may want to consider rebuilding the src.rpm ]. > > Regards, > P > > > -----Original Message----- > > From: Peter Mueller > > Sent: Tuesday, January 15, 2008 5:28 PM > > To: '[EMAIL PROTECTED]'; [email protected] > > Subject: RE: [Linux-HA] ever increasing cpu usage > > > > > I am seeing that my CPU usage is ever increasing, restarting the > > > various HA services drops it down to near 0 again but then it comes > > > back up again with time. > > > > > > Graph of CPU usage: > > > http://193.201.200.132/~rip/linuxha-cpu.png > > > > > > Investigating this I found that the offending process is > > > /usr/lib/heartbeat/lrmd > > > > > > My setup: > > > > > > CentOS 5.1 > > > Heartbeat 2.1 from centos extras > > > > > > Has anyone seen this behavior before and can perhaps shed some light? > > > > I am experiencing the same behavior on one cluster: > > http://world.anarchy.com/~peter/ha/cpu_increase.png > > CentOS release 4.5 (Final) > > Linux oakdb04 2.6.9-55.ELlargesmp > > heartbeat-stonith-2.1.2-3.el4.centos > > heartbeat-pils-2.1.2-3.el4.centos > > heartbeat-2.1.2-3.el4.centos > > > > top - 17:25:29 up 81 days, 4:34, 1 user, load average: 0.24, 0.22, 0.18 > > Tasks: 97 total, 2 running, 95 sleeping, 0 stopped, 0 zombie > > Cpu(s): 3.9% us, 0.4% sy, 0.0% ni, 94.4% id, 1.2% wa, 0.0% hi, 0.0% si > > Mem: 8163852k total, 8142292k used, 21560k free, 88432k buffers > > Swap: 8193140k total, 208k used, 8192932k free, 6467864k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 10040 root 16 0 244m 216m 1300 R 26 2.7 5696:05 lrmd > > 10362 mysql 15 0 7321m 1.0g 5340 S 6 13.3 12773:48 mysqld > > > > A few seconds of strace on lrmd: > > [EMAIL PROTECTED] ~]# strace -p 10040 > foo > > Process 10040 attached - interrupt to quit > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874110 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874116 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874123 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874123 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874123 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874123 > > poll([{fd=4, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}, {fd=6, > > events=POLLIN|POLLPRI}, {fd=7, events=POLLIN|POLLPRI}, {fd=9, > > events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}], 6, 1000) = 0 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=6, events=0}], 1, 0) = 0 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=7, events=0}], 1, 0) = 0 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=8, events=0}], 1, 0) = 0 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource temporarily > > unavailable) > > poll([{fd=9, events=0}], 1, 0) = 0 > > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > > tms_cstime=1264036}) = 1130874223 > > Process 10040 detached > > > > Regards, > > P > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
