Hello, I can confirm my strace looks the same.
On 1/16/08, Peter Mueller <[EMAIL PROTECTED]> wrote: > > I am seeing that my CPU usage is ever increasing, restarting the > > various HA services drops it down to near 0 again but then it comes > > back up again with time. > > > > Graph of CPU usage: > > http://193.201.200.132/~rip/linuxha-cpu.png > > > > Investigating this I found that the offending process is > /usr/lib/heartbeat/lrmd > > > > My setup: > > > > CentOS 5.1 > > Heartbeat 2.1 from centos extras > > > > Has anyone seen this behavior before and can perhaps shed some light? > > I am experiencing the same behavior on one cluster: > http://world.anarchy.com/~peter/ha/cpu_increase.png > CentOS release 4.5 (Final) > Linux oakdb04 2.6.9-55.ELlargesmp > heartbeat-stonith-2.1.2-3.el4.centos > heartbeat-pils-2.1.2-3.el4.centos > heartbeat-2.1.2-3.el4.centos > > top - 17:25:29 up 81 days, 4:34, 1 user, load average: 0.24, 0.22, > 0.18 > Tasks: 97 total, 2 running, 95 sleeping, 0 stopped, 0 zombie > Cpu(s): 3.9% us, 0.4% sy, 0.0% ni, 94.4% id, 1.2% wa, 0.0% hi, > 0.0% si > Mem: 8163852k total, 8142292k used, 21560k free, 88432k buffers > Swap: 8193140k total, 208k used, 8192932k free, 6467864k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 10040 root 16 0 244m 216m 1300 R 26 2.7 5696:05 lrmd > 10362 mysql 15 0 7321m 1.0g 5340 S 6 13.3 12773:48 mysqld > > A few seconds of strace on lrmd: > [EMAIL PROTECTED] ~]# strace -p 10040 > foo > Process 10040 attached - interrupt to quit > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > times({tms_utime=33660093, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874110 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > times({tms_utime=33660100, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874116 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874123 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874123 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874123 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874123 > poll([{fd=4, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}, > {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN|POLLPRI}, {fd=9, > events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}], 6, 1000) = 0 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > recvfrom(6, 0x51f533, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=6, events=0}], 1, 0) = 0 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > recvfrom(7, 0x522603, 3973, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=7, events=0}], 1, 0) = 0 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > recvfrom(8, 0x524a09, 3343, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=8, events=0}], 1, 0) = 0 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > recvfrom(9, 0x527144, 3972, 64, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=9, events=0}], 1, 0) = 0 > times({tms_utime=33660106, tms_stime=517128, tms_cutime=1049402, > tms_cstime=1264036}) = 1130874223 > Process 10040 detached > > Regards, > P > -- R.I.Pienaar _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
