The process memory size keeps increasing..could this be associated with a
memory leak?

# top -b -d 2 -n 2 -p 3358 ; gstack 3358
top - 23:23:25 up 2 days, 15:09,  1 user,  load average: 12.33, 12.03, 11.56
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 37.8%us,  0.3%sy,  0.0%ni, 61.1%id,  0.6%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   3962268k total,  1440752k used,  2521516k free,   142516k buffers
Swap:  4192956k total,        0k used,  4192956k free,  1110052k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3358 root      -2   0  118m 118m 5384 R 95.5  3.1   1447:46 heartbeat

top - 23:23:27 up 2 days, 15:09,  1 user,  load average: 12.33, 12.03, 11.56
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 95.0%us,  0.0%sy,  0.0%ni,  0.0%id,  4.5%wa,  0.0%hi,  0.5%si,
0.0%st
Mem:   3962268k total,  1435076k used,  2527192k free,   142516k buffers
Swap:  4192956k total,        0k used,  4192956k free,  1110060k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3358 root      -2   0  118m 118m 5384 R 94.5  3.1   1447:48 heartbeat

#0  0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0
#1  0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0
#2  0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#3  0x08056ae8 in ?? ()
#4  0x0805a247 in main ()
# top -b -d 2 -n 2 -p 3358 && gstack 3358
top - 16:47:50 up 2 days,  8:33,  3 users,  load average: 7.90, 7.57, 7.33
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 31.1%us,  0.3%sy,  0.0%ni, 68.0%id,  0.5%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   3962268k total,  1309556k used,  2652712k free,   141904k buffers
Swap:  4192956k total,        0k used,  4192956k free,   998576k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3358 root      -2   0  101m 101m 5384 R 95.5  2.6   1072:25 heartbeat

top - 16:47:52 up 2 days,  8:33,  3 users,  load average: 7.90, 7.57, 7.33
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 95.5%us,  0.0%sy,  0.0%ni,  4.0%id,  0.5%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:   3962268k total,  1309580k used,  2652688k free,   141904k buffers
Swap:  4192956k total,        0k used,  4192956k free,   998636k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3358 root      -2   0  101m 101m 5384 R 94.5  2.6   1072:27 heartbeat

On Mon, Aug 30, 2010 at 4:15 PM, Oozzzii Oz <[email protected]> wrote:

> Hearbeat has been pegging the CPU on the primary DRBD cluster for hours
> now...I see some timeout errors in the logs but nothing else to indicate why
> the heartbeat process is consuming so many cpu cycles. It memory size is
> significantly larger than similar systems, usually at  13mb only using
> 0.4cpu.
>
> Can anyone share some tips as to where I might look for probable cause? I'm
> sharing as much detail as possible on the current setup.
>
> SNSfile01:/var/log # top -b -d 2 -n 2 -p 3358
> top - 16:00:33 up 2 days,  7:46,  2 users,  load average: 7.59, 7.54, 7.53
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s): 30.2%us,  0.3%sy,  0.0%ni, 68.8%id,  0.5%wa,  0.0%hi,  0.1%si,
> 0.0%st
> Mem:   3962268k total,  1287412k used,  2674856k free,   141692k buffers
> Swap:  4192956k total,        0k used,  4192956k free,   982732k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3358 root      -2   0 99.0m  98m 5384 R 95.9  2.6   1027:44 heartbeat
>
>
> top - 16:00:35 up 2 days,  7:46,  2 users,  load average: 8.02, 7.63, 7.56
> Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> Cpu(s): 93.1%us,  0.5%sy,  0.0%ni,  6.0%id,  0.5%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:   3962268k total,  1287412k used,  2674856k free,   141692k buffers
> Swap:  4192956k total,        0k used,  4192956k free,   982732k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3358 root      -2   0 99.0m  98m 5384 S 93.1  2.6   1027:46 heartbeat
>
> gstack 3358
> #0  0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0
> #1  0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0
> #2  0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
> #3  0x08056ae8 in ?? ()
> #4  0x0805a247 in main ()
>
> SNSfile01:/var/log # ps aux|grep -i heart
> root      3358 30.9  2.5 101888 101884 ?       RLs  Aug28 1038:38
> heartbeat: master control process
> nobody    3367  0.0  0.1   6720  6716 ?        SL   Aug28   0:04 heartbeat:
> FIFO reader
> nobody    3368  0.0  0.1   6716  6712 ?        RL   Aug28   1:41 heartbeat:
> write: bcast eth3
> nobody    3369  0.0  0.1   6716  6712 ?        SL   Aug28   0:24 heartbeat:
> read: bcast eth3
>
>
> SNSfile01:/var/log # more /etc/ha.d/ha.cf /etc/ha.d/haresources
> ::::::::::::::
> /etc/ha.d/ha.cf
> ::::::::::::::
> logfile /var/log/ha-log
> debugfile /var/log/ha-debug
> bcast   eth3
> udpport 694
> warntime 8
> deadtime 30
> initdead 120
> keepalive 2
> auto_failback on
> node SNSfile01
> node SNSfile02
> ::::::::::::::
> /etc/ha.d/haresources
> ::::::::::::::
> SNSfile01 IPaddr::10.10.1.180/24 drbddisk::r0
> Filesystem::/dev/drbd0::/wwwroot::reiserfs nfsserver smb n
> mb
>
> SNSfile01:/var/log # procinfo
> Linux 2.6.27.7-9-pae (ge...@buildhost) (gcc 4.3.2) #1 SMP 2008-12-04
> 18:10:04 +0100 1CPU [SNSfile01.]
>
> Memory:      Total        Used        Free      Shared     Buffers
> Cached
> Mem:       3962268     1287900     2674368           0      141696
> 996976
> Swap:      4192956           0     4192956
>
> Bootup: Sat Aug 28 08:14:18 2010    Load average: 8.90 7.90 7.65 11/167
> 23585
>
> user  :      16:48:33.65  30.1%  page in :     608752  disk 1:    12118r
> 347678w
> nice  :       0:00:19.51   0.0%  page out:    7255902  disk 2:    29748r
> 240080w
> system:       0:10:29.42   0.3%  page act:     136932
> IOwait:       0:17:25.57   0.5%  page dea:          0
> hw irq:       0:00:39.38   0.0%  page flt:   43102070
> sw irq:       0:03:29.61   0.1%  swap in :          0
> idle  :   1d 14:18:20.75  68.7%  swap out:          0
> uptime:   2d  7:47:14.95         context :  118992651
>
> irq  0:        75 timer                 irq 12:        92 i8042
> irq  1:         8 i8042                 irq 14:    175143 ata_piix
> irq  3:         1                       irq 15:         0 ata_piix
> irq  4:         1                       irq 16:         0 vmci
> irq  6:         5 floppy [2]            irq 17:    429639 ioc0
> irq  7:         0 parport0              irq 18:  74429101 vmxnet ether
> irq  8:         0 rtc0                  irq 19:   1158878 vmxnet ether
> irq  9:         0 acpi
>
> SNSfile01:/var/log # tail  ha-log ha-debug
> ==> ha-log <==
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5748)
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf57b0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 100 ms (> 10 ms)
> (GSource: 0xddf5a98)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b00)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b68)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5bd0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5c38)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 80 ms (> 10 ms)
> (GSource: 0xddf5ca0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 60 ms (> 10 ms)
> (GSource: 0xddf5d08)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5d70)
>
> ==> ha-debug <==
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5748)
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf57b0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 100 ms (> 10 ms)
> (GSource: 0xddf5a98)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b00)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b68)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5bd0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5c38)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 80 ms (> 10 ms)
> (GSource: 0xddf5ca0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 60 ms (> 10 ms)
> (GSource: 0xddf5d08)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5d70)
>
> SNSfile01:/var/log # zypper info heartbeat
> Loading repository data...
> Reading installed packages...
>
> Information for package heartbeat:
>
> Repository: @System
> Name: heartbeat
> Version: 2.99.3-1.6
> Arch: i586
> Vendor: openSUSE
> Installed: Yes
> Status: up-to-date
> Installed Size: 1.0 M
> Summary: The Heartbeat Subsystem for High-Availability Linux
> Description:
> heartbeat is a sophisticated multinode resource manager for High
> Availability clusters.
>
> It can failover arbitrary resources, ranging from IP addresses over NFS
> to databases that are tied in via resource scripts. The resources can
> have arbitrary dependencies for ordering or placement between them.
>
> heartbeat contains a cluster membership layer, fencing, and local and
> clusterwide resource management functionality.
>
> 1.2/1.0 based 2-node only configurations are supported in a legacy
> mode.
>
> heartbeat implements the following kinds of heartbeats:
>
> - Serial ports
>
> - UDP/IPv4 broadcast, multi-cast, and unicast
>
> - IPv4 "ping" pseudo-cluster members.
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to