Hi, On Thu, Oct 15, 2009 at 07:02:55AM -0700, Robinson, Eric wrote: > > > Backups are known to be quite demanding. I'd suggest > > collecting data with sar. > > Done. > > The backup starts at 1:45 AM and finishes at 4:07 AM. > > I got the warning again 4 times last night between 1:50 AM and 4:00 AM. > So your theory may be correct. However... > > The longest delay was at 1:50:29 AM: > > heartbeat[9953]: 2009/10/15_01:50:29 WARN: Gmain_timeout_dispatch: > Dispatch function for send local status was delayed 10040 ms (> 1010 ms) > before being called (GSource: 0x9c282c8) > > Here is the sar output for that same time. Note that the load is not > especially high... > > CPU %user %nice %system %iowait > %steal %idle > 01:50:15 AM all 0.75 0.00 3.00 2.62 0.00 > 93.62 > 01:50:16 AM all 0.75 0.00 3.00 2.62 0.00 > 93.62 > 01:50:17 AM all 0.87 0.00 6.98 5.61 0.00 > 86.53 > 01:50:18 AM all 0.75 0.00 7.63 15.27 0.00 > 76.35 > 01:50:19 AM all 1.00 0.00 3.49 11.10 0.00 > 84.41 > 01:50:20 AM all 0.88 0.00 3.38 10.50 0.00 > 85.25 > 01:50:21 AM all 0.75 0.00 3.50 10.61 0.00 > 85.14 > 01:50:22 AM all 1.25 0.00 3.00 10.74 0.00 > 85.02 > 01:50:23 AM all 1.25 0.00 3.00 11.11 0.00 > 84.64 > 01:50:24 AM all 1.37 0.00 3.00 10.99 0.00 > 84.64 > 01:50:25 AM all 1.25 0.00 3.25 10.62 0.00 > 84.88 > 01:50:26 AM all 1.25 0.00 3.25 12.38 0.00 > 83.12 > 01:50:27 AM all 1.37 0.00 3.37 12.84 0.00 > 82.42 > 01:50:28 AM all 1.25 0.00 3.38 10.62 0.00 > 84.75 > 01:50:29 AM all 1.25 0.00 3.62 12.25 0.00 > 82.88 > 01:50:30 AM all 1.37 0.00 3.12 15.71 0.00 > 79.80 > 01:50:31 AM all 1.38 0.00 3.25 12.38 0.00 > 83.00 > 01:50:32 AM all 1.25 0.00 3.37 10.61 0.00 > 84.77 > 01:50:33 AM all 1.12 0.00 2.87 7.87 0.00 > 88.14 > 01:50:34 AM all 0.25 0.00 0.62 0.50 0.00 > 98.62 > 01:50:35 AM all 0.62 0.00 1.25 2.25 0.00 > 95.88 > 01:50:36 AM all 0.75 0.00 1.62 1.87 0.00 > 95.76 > 01:50:37 AM all 0.62 0.00 1.50 1.88 0.00 > 96.00 > 01:50:38 AM all 0.75 0.00 1.75 2.38 0.00 > 95.12 > 01:50:39 AM all 0.75 0.00 1.75 2.75 0.00 > 94.76 > 01:50:40 AM all 0.50 0.00 1.75 5.88 0.00 > 91.86 > > The next longest delay was at 2:00:22 AM: > > heartbeat[9953]: 2009/10/15_02:00:22 WARN: Gmain_timeout_dispatch: > Dispatch function for send local status was delayed 4900 ms (> 1010 ms) > before being called (GSource: 0x9c282c8) > > During this period, the load was quite high... > > CPU %user %nice %system %iowait > %steal %idle > 02:00:07 AM all 0.00 0.00 0.00 25.03 0.00 > 74.97 > 02:00:08 AM all 0.75 0.00 2.37 34.58 0.00 > 62.30 > 02:00:09 AM all 1.00 0.00 2.37 32.21 0.00 > 64.42 > 02:00:10 AM all 6.26 0.00 9.26 45.93 0.00 > 38.55 > 02:00:11 AM all 7.35 0.00 10.59 59.90 0.00 > 22.17 > 02:00:12 AM all 25.09 0.00 13.86 53.68 0.00 > 7.37 > 02:00:13 AM all 23.12 0.00 13.75 54.12 0.00 > 9.00 > 02:00:14 AM all 26.22 0.00 26.84 41.20 0.00 > 5.74 > 02:00:15 AM all 17.60 0.00 24.84 46.44 0.00 > 11.11 > 02:00:16 AM all 17.08 0.00 25.19 47.88 0.00 > 9.85 > 02:00:17 AM all 15.11 0.00 30.46 41.45 0.00 > 12.98 > 02:00:18 AM all 23.88 0.00 25.75 42.00 0.00 > 8.38 > 02:00:19 AM all 15.61 0.00 21.72 53.93 0.00 > 8.74 > 02:00:20 AM all 9.11 0.00 20.35 62.17 0.00 > 8.36 > 02:00:21 AM all 7.88 0.00 17.65 64.96 0.00 > 9.51 > 02:00:22 AM all 6.49 0.00 9.24 60.55 0.00 > 23.72 > 02:00:23 AM all 10.49 0.00 5.99 65.67 0.00 > 17.85 > 02:00:24 AM all 30.80 0.00 21.45 35.41 0.00 > 12.34 > 02:00:25 AM all 21.88 0.00 25.50 46.62 0.00 > 6.00 > 02:00:26 AM all 27.88 0.00 25.50 40.50 0.00 > 6.12 > 02:00:27 AM all 17.12 0.00 26.62 48.62 0.00 > 7.62 > 02:00:28 AM all 20.05 0.00 25.78 50.56 0.00 > 3.61 > 02:00:29 AM all 20.75 0.00 23.12 48.75 0.00 > 7.38 > 02:00:30 AM all 13.86 0.00 22.10 61.17 0.00 > 2.87 > 02:00:31 AM all 14.98 0.00 22.10 55.43 0.00 > 7.49 > 02:00:32 AM all 18.93 0.00 21.67 52.30 0.00 > 7.10 > 02:00:33 AM all 29.82 0.00 24.31 37.72 0.00 > 8.15 > 02:00:34 AM all 34.83 0.00 30.34 32.08 0.00 > 2.75 > 02:00:35 AM all 37.58 0.00 29.96 28.46 0.00 > 4.00 > 02:00:36 AM all 30.46 0.00 25.47 39.95 0.00 > 4.12 > 02:00:37 AM all 24.72 0.00 26.97 39.45 0.00 > 8.86 > 02:00:38 AM all 23.53 0.00 26.41 45.43 0.00 > 4.63 > 02:00:39 AM all 26.22 0.00 23.72 41.57 0.00 > 8.49 > > > The next longest delay was at 4:00:23 AM: > > heartbeat[9953]: 2009/10/15_04:00:23 WARN: Gmain_timeout_dispatch: > Dispatch function for send local status was delayed 3160 ms (> 1010 ms) > before being called (GSource: 0x9c282c8) > > Unfortunately, my sar script stopped capturing at 3:00, so I don't know > if the load was also high during this period. What I *do* know is that > the longest delay warning occurred when the system load was relatively > low. > > I would appreciate your follow-up thoughts. What would cause heartbeat > to be delayed in sending the local status when the load was low?
That's a kernel/performance issue. Perhaps there's an issue with the storage driver. The fact that the load seems to be low is probably because the metrics you're looking at are misleading. For instance, the first delay occurs around the time the backup client starts and it usually starts by scanning all filesystems which probably means many small storage requests. The later load is probably due to big I/O (file transfer) and it seems to bother the kernel a bit less than the former one. Thanks, Dejan > -- > Eric Robinson > > > Disclaimer - October 15, 2009 > This email and any files transmitted with it are confidential and intended > solely for General Linux-HA mailing list. If you are not the named addressee > you should not disseminate, distribute, copy or alter this email. Any views > or opinions presented in this email are solely those of the author and might > not represent those of Physician Select Management and Physician's Managed > Care. Warning: Although Physician Select Management and Physician's Managed > Care have taken reasonable precautions to ensure no viruses are present in > this email, the companies cannot accept responsibility for any loss or damage > arising from the use of this email or attachments. > This disclaimer was added by Policy Patrol: http://www.policypatrol.com/ > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
