Guy,

The reporting issue in amstatus (reporting negative value) is already fix, check the svn tree.

I'm surprised that you complain about a small amstatus reporting issue but not about the fact that your backup failed, you should have at least posted the error from the email report.

I found the following in the amdump.1 file you sent me privately:
GLib-CRITICAL **: g_main_context_find_source_by_id: assertion `source_id > 0' failed

It's the first time I see it, I think it is related to amanda exceeding some glib capacity because your dump are really large.
Are you using a 32 bits amanda?
Can you try the attached patch?

Jean-Louis

On 10/01/2012 09:21 AM, Guy Sisalli wrote:
We're noticing peculiar behavior running Amanda 3.3.2 under FreeBSD 9. Backups appear to work fine until dealing with a sizable DLE consisting of a single zfs snapshot. We ran a backup without a holding area and dumped directly to tape. Things went okay, and then the 10 TB DLE backup suddenly went wrong at the 6-7 TB mark.

At somewhere between 6 and 7 TB, while writing to tape 6, the readout on amstatus displayed negative values (see below). At this point, the 10 TB DLE was still listed as being sent to tape, but no further progress was being registered on that DLE.

Instead, amanda had moved on to dump a second DLE to tape. We only have one tape drive, but amstatus was listing two DLEs as being sent to tape, and only one was making progress.

After all the other DLEs had finished, work resumed on the 10 TB DLE, but amanda claimed to be writing more than 100% of the DLE to tape. The same negative write symptom appeared on this last tape. Is this some sort of known issue?

Using /usr/local/etc/amanda/dailyset/state/log/amdump.1
From Fri Sep 28 17:16:47 EDT 2012

localhost:/transport/alpha          0        46g finished (14:58:32)
localhost:/transport/bravo          0         0g finished (17:20:20)
localhost:/transport/charlie        0       278g finished (16:37:19)
localhost:/transport/delta          0       202g finished (15:51:05)
localhost:/transport/echo           0      1674g finished (1+21:17:28)
localhost:/transport/foxtrot        0         0g finished (14:48:18)
localhost:/transport/golf           0         0g finished (14:47:45)
localhost:/transport/hotel          0      2453g finished (1+4:01:32)
localhost:/transport/india          0        63g finished (15:17:03)
localhost:/transport/juliet         0        16g finished (14:51:02)
localhost:/transport/kilo           0        49g finished (15:06:35)
localhost:/transport/lima           0         0g finished (14:48:00)
localhost:/transport/mike 0 10052g failed: killed while dumping to tape (10578g done (105.24%)) (1+12:14:40)

SUMMARY          part      real  estimated
                           size       size
partition       :  13
estimated       :  13                14838g
flush           :   0         0g
failed          :   0                    0g           (  0.00%)
wait for dumping:   0                    0g           (  0.00%)
dumping to tape :   1     10578g     10052g (105.24%) ( 67.74%)
dumping         :   0         0g         0g (  0.00%) (  0.00%)
dumped          :  12      4786g      4786g (100.00%) ( 32.26%)
wait for writing:   0         0g         0g (  0.00%) (  0.00%)
wait to flush   :   0         0g         0g (100.00%) (  0.00%)
writing to tape :   0         0g         0g (  0.00%) (  0.00%)
failed to tape  :   0         0g         0g (  0.00%) (  0.00%)
taped           :  12      4786g      4786g (100.00%) ( 32.26%)
tape 1 : 1 1524g 1524g ( 69.32%) Aarc00054 (46 chunks) tape 2 : 0 1508g 1508g ( 68.58%) Aarc00056 (44 chunks) tape 3 : 0 1504g 1504g ( 68.37%) Aarc00057 (43 chunks) tape 4 : 0 1506g 1506g ( 68.47%) Aarc00058 (44 chunks) tape 5 : 0 1473g 1473g ( 66.97%) Aarc00044 (43 chunks) tape 6 : 10 -5978g 4073g (-271.75%) Aarc00043 (65 chunks) tape 7 : 1 2395g 2395g (108.89%) Aarc00041 (70 chunks) tape 8 : 1 1950g 1950g ( 88.65%) Aarc00042 (58 chunks) tape 9 : 0 1489g 1489g ( 67.70%) Aarc00049 (43 chunks) tape 10 : 1 -2588g 7463g (-117.64%) Aarc00050 (11 chunks)
3 dumpers idle  : no-dumpers
taper status: Idle
taper qlen: 0
network free kps:    158000
holding space   :         0g (  0.00%)
 dumper0 busy   :  8:13:08  ( 19.13%)
 dumper1 busy   : 21:26:17  ( 49.90%)
   taper busy   : 1+18:54:12  ( 99.86%)
0 dumpers busy : 0:09:11 ( 0.36%) no-dumpers: 0:05:53 ( 64.13%) not-idle: 0:03:17 ( 35.86%) 1 dumper busy : 22:10:24 ( 51.61%) no-dumpers: 22:10:24 (100.00%) 2 dumpers busy : 20:38:18 ( 48.04%) no-dumpers: 20:38:18 (100.00%)

diff --git a/server-src/dumper.c b/server-src/dumper.c
index 7712887..adb90ea 100644
--- a/server-src/dumper.c
+++ b/server-src/dumper.c
@@ -1867,25 +1867,33 @@ handle_filter_stderr(
  * Startup a timeout in the event handler.  If the arg is 0,
  * then remove the timeout.
  */
+static event_handle_t *ev_timeout = NULL;
+static time_t timeout_time;
+
 static void
 timeout(
     time_t seconds)
 {
-    static event_handle_t *ev_timeout = NULL;
+    timeout_time = time(NULL) + seconds;
 
     /*
-     * First, remove a timeout if one is active.
+     * remove a timeout if seconds is 0
      */
-    if (ev_timeout != NULL) {
-	event_release(ev_timeout);
-	ev_timeout = NULL;
+    if (seconds == 0) {
+	if (ev_timeout != NULL) {
+	    event_release(ev_timeout);
+	    ev_timeout = NULL;
+	}
+	return;
     }
 
     /*
-     * Now, schedule a new one if 'seconds' is greater than 0
+     * schedule a timeout if it not already scheduled
      */
-    if (seconds > 0)
-	ev_timeout = event_register((event_id_t)seconds, EV_TIME, timeout_callback, NULL);
+    if (ev_timeout == NULL) {
+	ev_timeout = event_register((event_id_t)seconds, EV_TIME,
+				     timeout_callback, NULL);
+    }
 }
 
 /*
@@ -1896,8 +1904,20 @@ static void
 timeout_callback(
     void *	unused)
 {
+    time_t now = time(NULL);
     (void)unused;	/* Quiet unused parameter warning */
 
+    if (ev_timeout != NULL) {
+	event_release(ev_timeout);
+	ev_timeout = NULL;
+    }
+
+    if (timeout_time > now) { /* not a data timeout yet */
+	ev_timeout = event_register((event_id_t)(timeout_time-now), EV_TIME,
+				    timeout_callback, NULL);
+	return;
+    }
+
     assert(unused == NULL);
     g_free(errstr);
     errstr = g_strdup(_("data timeout"));

Reply via email to