> Backups are known to be quite demanding. I'd suggest
> collecting data with sar.
Done.
The backup starts at 1:45 AM and finishes at 4:07 AM.
I got the warning again 4 times last night between 1:50 AM and 4:00 AM.
So your theory may be correct. However...
The longest delay was at 1:50:29 AM:
heartbeat[9953]: 2009/10/15_01:50:29 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 10040 ms (> 1010 ms)
before being called (GSource: 0x9c282c8)
Here is the sar output for that same time. Note that the load is not
especially high...
CPU %user %nice %system %iowait
%steal %idle
01:50:15 AM all 0.75 0.00 3.00 2.62 0.00
93.62
01:50:16 AM all 0.75 0.00 3.00 2.62 0.00
93.62
01:50:17 AM all 0.87 0.00 6.98 5.61 0.00
86.53
01:50:18 AM all 0.75 0.00 7.63 15.27 0.00
76.35
01:50:19 AM all 1.00 0.00 3.49 11.10 0.00
84.41
01:50:20 AM all 0.88 0.00 3.38 10.50 0.00
85.25
01:50:21 AM all 0.75 0.00 3.50 10.61 0.00
85.14
01:50:22 AM all 1.25 0.00 3.00 10.74 0.00
85.02
01:50:23 AM all 1.25 0.00 3.00 11.11 0.00
84.64
01:50:24 AM all 1.37 0.00 3.00 10.99 0.00
84.64
01:50:25 AM all 1.25 0.00 3.25 10.62 0.00
84.88
01:50:26 AM all 1.25 0.00 3.25 12.38 0.00
83.12
01:50:27 AM all 1.37 0.00 3.37 12.84 0.00
82.42
01:50:28 AM all 1.25 0.00 3.38 10.62 0.00
84.75
01:50:29 AM all 1.25 0.00 3.62 12.25 0.00
82.88
01:50:30 AM all 1.37 0.00 3.12 15.71 0.00
79.80
01:50:31 AM all 1.38 0.00 3.25 12.38 0.00
83.00
01:50:32 AM all 1.25 0.00 3.37 10.61 0.00
84.77
01:50:33 AM all 1.12 0.00 2.87 7.87 0.00
88.14
01:50:34 AM all 0.25 0.00 0.62 0.50 0.00
98.62
01:50:35 AM all 0.62 0.00 1.25 2.25 0.00
95.88
01:50:36 AM all 0.75 0.00 1.62 1.87 0.00
95.76
01:50:37 AM all 0.62 0.00 1.50 1.88 0.00
96.00
01:50:38 AM all 0.75 0.00 1.75 2.38 0.00
95.12
01:50:39 AM all 0.75 0.00 1.75 2.75 0.00
94.76
01:50:40 AM all 0.50 0.00 1.75 5.88 0.00
91.86
The next longest delay was at 2:00:22 AM:
heartbeat[9953]: 2009/10/15_02:00:22 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 4900 ms (> 1010 ms)
before being called (GSource: 0x9c282c8)
During this period, the load was quite high...
CPU %user %nice %system %iowait
%steal %idle
02:00:07 AM all 0.00 0.00 0.00 25.03 0.00
74.97
02:00:08 AM all 0.75 0.00 2.37 34.58 0.00
62.30
02:00:09 AM all 1.00 0.00 2.37 32.21 0.00
64.42
02:00:10 AM all 6.26 0.00 9.26 45.93 0.00
38.55
02:00:11 AM all 7.35 0.00 10.59 59.90 0.00
22.17
02:00:12 AM all 25.09 0.00 13.86 53.68 0.00
7.37
02:00:13 AM all 23.12 0.00 13.75 54.12 0.00
9.00
02:00:14 AM all 26.22 0.00 26.84 41.20 0.00
5.74
02:00:15 AM all 17.60 0.00 24.84 46.44 0.00
11.11
02:00:16 AM all 17.08 0.00 25.19 47.88 0.00
9.85
02:00:17 AM all 15.11 0.00 30.46 41.45 0.00
12.98
02:00:18 AM all 23.88 0.00 25.75 42.00 0.00
8.38
02:00:19 AM all 15.61 0.00 21.72 53.93 0.00
8.74
02:00:20 AM all 9.11 0.00 20.35 62.17 0.00
8.36
02:00:21 AM all 7.88 0.00 17.65 64.96 0.00
9.51
02:00:22 AM all 6.49 0.00 9.24 60.55 0.00
23.72
02:00:23 AM all 10.49 0.00 5.99 65.67 0.00
17.85
02:00:24 AM all 30.80 0.00 21.45 35.41 0.00
12.34
02:00:25 AM all 21.88 0.00 25.50 46.62 0.00
6.00
02:00:26 AM all 27.88 0.00 25.50 40.50 0.00
6.12
02:00:27 AM all 17.12 0.00 26.62 48.62 0.00
7.62
02:00:28 AM all 20.05 0.00 25.78 50.56 0.00
3.61
02:00:29 AM all 20.75 0.00 23.12 48.75 0.00
7.38
02:00:30 AM all 13.86 0.00 22.10 61.17 0.00
2.87
02:00:31 AM all 14.98 0.00 22.10 55.43 0.00
7.49
02:00:32 AM all 18.93 0.00 21.67 52.30 0.00
7.10
02:00:33 AM all 29.82 0.00 24.31 37.72 0.00
8.15
02:00:34 AM all 34.83 0.00 30.34 32.08 0.00
2.75
02:00:35 AM all 37.58 0.00 29.96 28.46 0.00
4.00
02:00:36 AM all 30.46 0.00 25.47 39.95 0.00
4.12
02:00:37 AM all 24.72 0.00 26.97 39.45 0.00
8.86
02:00:38 AM all 23.53 0.00 26.41 45.43 0.00
4.63
02:00:39 AM all 26.22 0.00 23.72 41.57 0.00
8.49
The next longest delay was at 4:00:23 AM:
heartbeat[9953]: 2009/10/15_04:00:23 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 3160 ms (> 1010 ms)
before being called (GSource: 0x9c282c8)
Unfortunately, my sar script stopped capturing at 3:00, so I don't know
if the load was also high during this period. What I *do* know is that
the longest delay warning occurred when the system load was relatively
low.
I would appreciate your follow-up thoughts. What would cause heartbeat
to be delayed in sending the local status when the load was low?
--
Eric Robinson
Disclaimer - October 15, 2009
This email and any files transmitted with it are confidential and intended
solely for General Linux-HA mailing list. If you are not the named addressee
you should not disseminate, distribute, copy or alter this email. Any views or
opinions presented in this email are solely those of the author and might not
represent those of Physician Select Management and Physician's Managed Care.
Warning: Although Physician Select Management and Physician's Managed Care have
taken reasonable precautions to ensure no viruses are present in this email,
the companies cannot accept responsibility for any loss or damage arising from
the use of this email or attachments.
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems