> Backups are known to be quite demanding. I'd suggest 
> collecting data with sar.

Done.

The backup starts at 1:45 AM and finishes at 4:07 AM. 

I got the warning again 4 times last night between 1:50 AM and 4:00 AM.
So your theory may be correct. However... 

The longest delay was at 1:50:29 AM:

heartbeat[9953]: 2009/10/15_01:50:29 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 10040 ms (> 1010 ms)
before being called (GSource: 0x9c282c8)

Here is the sar output for that same time. Note that the load is not
especially high...

                    CPU     %user     %nice   %system   %iowait
%steal     %idle
01:50:15 AM     all      0.75      0.00      3.00      2.62      0.00
93.62
01:50:16 AM     all      0.75      0.00      3.00      2.62      0.00
93.62
01:50:17 AM     all      0.87      0.00      6.98      5.61      0.00
86.53
01:50:18 AM     all      0.75      0.00      7.63     15.27      0.00
76.35
01:50:19 AM     all      1.00      0.00      3.49     11.10      0.00
84.41
01:50:20 AM     all      0.88      0.00      3.38     10.50      0.00
85.25
01:50:21 AM     all      0.75      0.00      3.50     10.61      0.00
85.14
01:50:22 AM     all      1.25      0.00      3.00     10.74      0.00
85.02
01:50:23 AM     all      1.25      0.00      3.00     11.11      0.00
84.64
01:50:24 AM     all      1.37      0.00      3.00     10.99      0.00
84.64
01:50:25 AM     all      1.25      0.00      3.25     10.62      0.00
84.88
01:50:26 AM     all      1.25      0.00      3.25     12.38      0.00
83.12
01:50:27 AM     all      1.37      0.00      3.37     12.84      0.00
82.42
01:50:28 AM     all      1.25      0.00      3.38     10.62      0.00
84.75
01:50:29 AM     all      1.25      0.00      3.62     12.25      0.00
82.88
01:50:30 AM     all      1.37      0.00      3.12     15.71      0.00
79.80
01:50:31 AM     all      1.38      0.00      3.25     12.38      0.00
83.00
01:50:32 AM     all      1.25      0.00      3.37     10.61      0.00
84.77
01:50:33 AM     all      1.12      0.00      2.87      7.87      0.00
88.14
01:50:34 AM     all      0.25      0.00      0.62      0.50      0.00
98.62
01:50:35 AM     all      0.62      0.00      1.25      2.25      0.00
95.88
01:50:36 AM     all      0.75      0.00      1.62      1.87      0.00
95.76
01:50:37 AM     all      0.62      0.00      1.50      1.88      0.00
96.00
01:50:38 AM     all      0.75      0.00      1.75      2.38      0.00
95.12
01:50:39 AM     all      0.75      0.00      1.75      2.75      0.00
94.76
01:50:40 AM     all      0.50      0.00      1.75      5.88      0.00
91.86

The next longest delay was at 2:00:22 AM:

heartbeat[9953]: 2009/10/15_02:00:22 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 4900 ms (> 1010 ms)
before being called (GSource: 0x9c282c8)

During this period, the load was quite high...

                    CPU     %user     %nice   %system   %iowait
%steal     %idle
02:00:07 AM     all      0.00      0.00      0.00     25.03      0.00
74.97
02:00:08 AM     all      0.75      0.00      2.37     34.58      0.00
62.30
02:00:09 AM     all      1.00      0.00      2.37     32.21      0.00
64.42
02:00:10 AM     all      6.26      0.00      9.26     45.93      0.00
38.55
02:00:11 AM     all      7.35      0.00     10.59     59.90      0.00
22.17
02:00:12 AM     all     25.09      0.00     13.86     53.68      0.00
7.37
02:00:13 AM     all     23.12      0.00     13.75     54.12      0.00
9.00
02:00:14 AM     all     26.22      0.00     26.84     41.20      0.00
5.74
02:00:15 AM     all     17.60      0.00     24.84     46.44      0.00
11.11
02:00:16 AM     all     17.08      0.00     25.19     47.88      0.00
9.85
02:00:17 AM     all     15.11      0.00     30.46     41.45      0.00
12.98
02:00:18 AM     all     23.88      0.00     25.75     42.00      0.00
8.38
02:00:19 AM     all     15.61      0.00     21.72     53.93      0.00
8.74
02:00:20 AM     all      9.11      0.00     20.35     62.17      0.00
8.36
02:00:21 AM     all      7.88      0.00     17.65     64.96      0.00
9.51
02:00:22 AM     all      6.49      0.00      9.24     60.55      0.00
23.72
02:00:23 AM     all     10.49      0.00      5.99     65.67      0.00
17.85
02:00:24 AM     all     30.80      0.00     21.45     35.41      0.00
12.34
02:00:25 AM     all     21.88      0.00     25.50     46.62      0.00
6.00
02:00:26 AM     all     27.88      0.00     25.50     40.50      0.00
6.12
02:00:27 AM     all     17.12      0.00     26.62     48.62      0.00
7.62
02:00:28 AM     all     20.05      0.00     25.78     50.56      0.00
3.61
02:00:29 AM     all     20.75      0.00     23.12     48.75      0.00
7.38
02:00:30 AM     all     13.86      0.00     22.10     61.17      0.00
2.87
02:00:31 AM     all     14.98      0.00     22.10     55.43      0.00
7.49
02:00:32 AM     all     18.93      0.00     21.67     52.30      0.00
7.10
02:00:33 AM     all     29.82      0.00     24.31     37.72      0.00
8.15
02:00:34 AM     all     34.83      0.00     30.34     32.08      0.00
2.75
02:00:35 AM     all     37.58      0.00     29.96     28.46      0.00
4.00
02:00:36 AM     all     30.46      0.00     25.47     39.95      0.00
4.12
02:00:37 AM     all     24.72      0.00     26.97     39.45      0.00
8.86
02:00:38 AM     all     23.53      0.00     26.41     45.43      0.00
4.63
02:00:39 AM     all     26.22      0.00     23.72     41.57      0.00
8.49


The next longest delay was at 4:00:23 AM:

heartbeat[9953]: 2009/10/15_04:00:23 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 3160 ms (> 1010 ms)
before being called (GSource: 0x9c282c8)

Unfortunately, my sar script stopped capturing at 3:00, so I don't know
if the load was also high during this period. What I *do* know is that
the longest delay warning occurred when the system load was relatively
low.

I would appreciate your follow-up thoughts. What would cause heartbeat
to be delayed in sending the local status when the load was low? 

--
Eric Robinson


Disclaimer - October 15, 2009 
This email and any files transmitted with it are confidential and intended 
solely for General Linux-HA mailing list. If you are not the named addressee 
you should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management and Physician's Managed Care. 
Warning: Although Physician Select Management and Physician's Managed Care have 
taken reasonable precautions to ensure no viruses are present in this email, 
the companies cannot accept responsibility for any loss or damage arising from 
the use of this email or attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to