John,

I have made some progress on this problem, which is why it has taken so long
to respond to your last message. I first changed out my primary SCSI card
which drives my system and dump data disks, and the tape jukebox. I found
that my self terminating internal drive was not, so swapped out cables to
one with an attached terminator. This allowed me to get a clean backup of a
larger partition on the system out to tape and back. Something I had not
realized I was having a problem with until recently. I then compiled in your
driver fix and tried another Amanda run. It failed with exactly the same
problem, but your patch did not appear to give any additional data. So, just
to be absolutely sure that I had everything clean, I rebuilt the system from
scratch (again). But, this time I did not run Bastille on it. I then setup
an identical Amanda run and this time it worked flawlessly. I ran Bastille
and this time I did not allow it to restrict system resources. This time
everything is working again flawlessly. I had forgotten until late last week
that I had chosen that option. Everything appears to be working now. Thank
you for all your help!

        markh

-----Original Message-----
From: John R. Jackson [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 14, 2001 10:57 AM
To: Mark Holm
Cc: '[EMAIL PROTECTED]'
Subject: Re: Need help figuring out why these don't complete 


>... that tells me that it is not partition dependent at least.  ...

Agreed.  Not sure if that's good or bad :-).

>I am not getting any core files ...

OK.

Please try the following patch.  It will not fix anything, but will log
the exit code of all the processes driver starts, in particular, the
messed up dumpers.  Actually, it will only log them if they indicate some
kind of failure, but I'd be surprised if they are going away cleanly.
The messages will go to the E-mail report and also to the tail end
of amdump.1.  Please post the last 30 or so lines of amdump.1 when you
get the next failure.

My guess at the moment is that dumper got a signal of some type that
caused it to exit.  That should show up in the patched output.

Assuming that's what happens, you might also pick one or more of the
"exited with signal NN" lines and look up signal number "NN" in your
/usr/include/sys/signal.h file and let me know what they are, just to
make sure we don't start talking about apples and oranges because of
OS differences.

>... If you go to the end of the file at the point where it starts
>doing the first of the backups that failed, for some reason it starts
>backing directly to tape.  ...

Huh?  I don't see that in the log you sent.  All the commands I see
are FILE-DUMP.  Which file are you talking about (what "start at" from
the first line)?  Or what makes you think it was direct to tape?

In any case, it doesn't really matter.  All the other failures I've seen
in your logs are through the holding disk, so even if this one was direct
to tape, that does not seem to be related.

>       markh

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]

(Run "e mhn" at "What now?", then $}base64 -d as needed and remove the
Content-Transfer-Encoding line (leave it blank) and these lines.)

Reply via email to