The process memory size keeps increasing..could this be associated with a memory leak?
# top -b -d 2 -n 2 -p 3358 ; gstack 3358 top - 23:23:25 up 2 days, 15:09, 1 user, load average: 12.33, 12.03, 11.56 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 37.8%us, 0.3%sy, 0.0%ni, 61.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 3962268k total, 1440752k used, 2521516k free, 142516k buffers Swap: 4192956k total, 0k used, 4192956k free, 1110052k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3358 root -2 0 118m 118m 5384 R 95.5 3.1 1447:46 heartbeat top - 23:23:27 up 2 days, 15:09, 1 user, load average: 12.33, 12.03, 11.56 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 95.0%us, 0.0%sy, 0.0%ni, 0.0%id, 4.5%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 3962268k total, 1435076k used, 2527192k free, 142516k buffers Swap: 4192956k total, 0k used, 4192956k free, 1110060k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3358 root -2 0 118m 118m 5384 R 94.5 3.1 1447:48 heartbeat #0 0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0 #1 0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0 #2 0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #3 0x08056ae8 in ?? () #4 0x0805a247 in main () # top -b -d 2 -n 2 -p 3358 && gstack 3358 top - 16:47:50 up 2 days, 8:33, 3 users, load average: 7.90, 7.57, 7.33 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 31.1%us, 0.3%sy, 0.0%ni, 68.0%id, 0.5%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 3962268k total, 1309556k used, 2652712k free, 141904k buffers Swap: 4192956k total, 0k used, 4192956k free, 998576k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3358 root -2 0 101m 101m 5384 R 95.5 2.6 1072:25 heartbeat top - 16:47:52 up 2 days, 8:33, 3 users, load average: 7.90, 7.57, 7.33 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 95.5%us, 0.0%sy, 0.0%ni, 4.0%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3962268k total, 1309580k used, 2652688k free, 141904k buffers Swap: 4192956k total, 0k used, 4192956k free, 998636k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3358 root -2 0 101m 101m 5384 R 94.5 2.6 1072:27 heartbeat On Mon, Aug 30, 2010 at 4:15 PM, Oozzzii Oz <[email protected]> wrote: > Hearbeat has been pegging the CPU on the primary DRBD cluster for hours > now...I see some timeout errors in the logs but nothing else to indicate why > the heartbeat process is consuming so many cpu cycles. It memory size is > significantly larger than similar systems, usually at 13mb only using > 0.4cpu. > > Can anyone share some tips as to where I might look for probable cause? I'm > sharing as much detail as possible on the current setup. > > SNSfile01:/var/log # top -b -d 2 -n 2 -p 3358 > top - 16:00:33 up 2 days, 7:46, 2 users, load average: 7.59, 7.54, 7.53 > Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie > Cpu(s): 30.2%us, 0.3%sy, 0.0%ni, 68.8%id, 0.5%wa, 0.0%hi, 0.1%si, > 0.0%st > Mem: 3962268k total, 1287412k used, 2674856k free, 141692k buffers > Swap: 4192956k total, 0k used, 4192956k free, 982732k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 3358 root -2 0 99.0m 98m 5384 R 95.9 2.6 1027:44 heartbeat > > > top - 16:00:35 up 2 days, 7:46, 2 users, load average: 8.02, 7.63, 7.56 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): 93.1%us, 0.5%sy, 0.0%ni, 6.0%id, 0.5%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 3962268k total, 1287412k used, 2674856k free, 141692k buffers > Swap: 4192956k total, 0k used, 4192956k free, 982732k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 3358 root -2 0 99.0m 98m 5384 S 93.1 2.6 1027:46 heartbeat > > gstack 3358 > #0 0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0 > #1 0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0 > #2 0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 > #3 0x08056ae8 in ?? () > #4 0x0805a247 in main () > > SNSfile01:/var/log # ps aux|grep -i heart > root 3358 30.9 2.5 101888 101884 ? RLs Aug28 1038:38 > heartbeat: master control process > nobody 3367 0.0 0.1 6720 6716 ? SL Aug28 0:04 heartbeat: > FIFO reader > nobody 3368 0.0 0.1 6716 6712 ? RL Aug28 1:41 heartbeat: > write: bcast eth3 > nobody 3369 0.0 0.1 6716 6712 ? SL Aug28 0:24 heartbeat: > read: bcast eth3 > > > SNSfile01:/var/log # more /etc/ha.d/ha.cf /etc/ha.d/haresources > :::::::::::::: > /etc/ha.d/ha.cf > :::::::::::::: > logfile /var/log/ha-log > debugfile /var/log/ha-debug > bcast eth3 > udpport 694 > warntime 8 > deadtime 30 > initdead 120 > keepalive 2 > auto_failback on > node SNSfile01 > node SNSfile02 > :::::::::::::: > /etc/ha.d/haresources > :::::::::::::: > SNSfile01 IPaddr::10.10.1.180/24 drbddisk::r0 > Filesystem::/dev/drbd0::/wwwroot::reiserfs nfsserver smb n > mb > > SNSfile01:/var/log # procinfo > Linux 2.6.27.7-9-pae (ge...@buildhost) (gcc 4.3.2) #1 SMP 2008-12-04 > 18:10:04 +0100 1CPU [SNSfile01.] > > Memory: Total Used Free Shared Buffers > Cached > Mem: 3962268 1287900 2674368 0 141696 > 996976 > Swap: 4192956 0 4192956 > > Bootup: Sat Aug 28 08:14:18 2010 Load average: 8.90 7.90 7.65 11/167 > 23585 > > user : 16:48:33.65 30.1% page in : 608752 disk 1: 12118r > 347678w > nice : 0:00:19.51 0.0% page out: 7255902 disk 2: 29748r > 240080w > system: 0:10:29.42 0.3% page act: 136932 > IOwait: 0:17:25.57 0.5% page dea: 0 > hw irq: 0:00:39.38 0.0% page flt: 43102070 > sw irq: 0:03:29.61 0.1% swap in : 0 > idle : 1d 14:18:20.75 68.7% swap out: 0 > uptime: 2d 7:47:14.95 context : 118992651 > > irq 0: 75 timer irq 12: 92 i8042 > irq 1: 8 i8042 irq 14: 175143 ata_piix > irq 3: 1 irq 15: 0 ata_piix > irq 4: 1 irq 16: 0 vmci > irq 6: 5 floppy [2] irq 17: 429639 ioc0 > irq 7: 0 parport0 irq 18: 74429101 vmxnet ether > irq 8: 0 rtc0 irq 19: 1158878 vmxnet ether > irq 9: 0 acpi > > SNSfile01:/var/log # tail ha-log ha-debug > ==> ha-log <== > heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5748) > heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf57b0) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 100 ms (> 10 ms) > (GSource: 0xddf5a98) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5b00) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5b68) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5bd0) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5c38) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 80 ms (> 10 ms) > (GSource: 0xddf5ca0) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 60 ms (> 10 ms) > (GSource: 0xddf5d08) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5d70) > > ==> ha-debug <== > heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5748) > heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf57b0) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 100 ms (> 10 ms) > (GSource: 0xddf5a98) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5b00) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5b68) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5bd0) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5c38) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 80 ms (> 10 ms) > (GSource: 0xddf5ca0) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 60 ms (> 10 ms) > (GSource: 0xddf5d08) > heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch > function for retransmit request took too long to execute: 70 ms (> 10 ms) > (GSource: 0xddf5d70) > > SNSfile01:/var/log # zypper info heartbeat > Loading repository data... > Reading installed packages... > > Information for package heartbeat: > > Repository: @System > Name: heartbeat > Version: 2.99.3-1.6 > Arch: i586 > Vendor: openSUSE > Installed: Yes > Status: up-to-date > Installed Size: 1.0 M > Summary: The Heartbeat Subsystem for High-Availability Linux > Description: > heartbeat is a sophisticated multinode resource manager for High > Availability clusters. > > It can failover arbitrary resources, ranging from IP addresses over NFS > to databases that are tied in via resource scripts. The resources can > have arbitrary dependencies for ordering or placement between them. > > heartbeat contains a cluster membership layer, fencing, and local and > clusterwide resource management functionality. > > 1.2/1.0 based 2-node only configurations are supported in a legacy > mode. > > heartbeat implements the following kinds of heartbeats: > > - Serial ports > > - UDP/IPv4 broadcast, multi-cast, and unicast > > - IPv4 "ping" pseudo-cluster members. > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
