This is a recently configured Amanda 2.5.1p3 setup on a group of E250's
running Solaris 9. It has been running smoothly for a few weeks now.
Just before building Amanda, I had done a reconfigure reboot to get
drivers running for a new tape library. This weekend, after the weekend
backups had completed successfully and everything had flushed out to
tape, I did a `touch /reconfigure; reboot` to get a new drive recognized
(should have been a hot swappable drive, but seagate has changed their
firmware so it doesn't work the way it once did). Anyway, all the
services came up alright as far as I could tell. This is actually a
departmental server with sendmail, apache, samba, etc. all running on it
(they couldn't afford the parts to set up a dedicated backup server).
When I looked at the backup reports this morning, this server had:
*** THE DUMPS DID NOT FINISH PROPERLY!
The next tape Amanda expects to use is: geo-daily-05.
The next new tape already labelled is: geo-daily-03.
FAILURE AND STRANGE DUMP SUMMARY:
eclogite.geo.umass.edu / RESULTS MISSING
...<snip>...
eclogite.geo.umass.edu /var/mail RESULTS MISSING
gis.geo.umass.edu / RESULTS MISSING
...<snip>...
gis.geo.umass.edu /usr/local RESULTS MISSING
mogollon.geo.umass.edu / RESULTS MISSING
...<snip>...
mogollon.geo.umass.edu /usr/local RESULTS MISSING
driver: FATAL Did not get DATE line from planner
The debug files were sparce:
eclogite:/:root# cd /tmp/amanda
eclogite:/tmp/amanda:root# ls -l
total 16
drwx------ 3 amandabackup amandabackup 179 Aug 24 00:45 server
eclogite:/tmp/amanda:root# cd server
eclogite:/tmp/amanda/server:root# ls -l
total 16
drwx------ 2 amandabackup amandabackup 546 Aug 24 00:45 daily
eclogite:/tmp/amanda/server:root# cd daily
eclogite:/tmp/amanda/server/daily:root# ls -l
total 80
-rw-r----- 1 amandabackup amandabackup 223 Aug 24 00:45
amlogroll.20090824004501.debug
-rw-r----- 1 amandabackup amandabackup 220 Aug 24 00:45
amreport.20090824004501.debug
-rw-r----- 1 amandabackup amandabackup 3875 Aug 24 00:45
amtrmidx.20090824004501.debug
-rw-r----- 1 amandabackup amandabackup 286 Aug 24 00:45
amtrmlog.20090824004501.debug
-rw-r----- 1 amandabackup amandabackup 281 Aug 24 00:45
driver.20090824004500.debug
eclogite:/tmp/amanda/server/daily:root# more driver.20090824004500.debug
driver: debug 1 pid 12625 ruid 555 euid 555: start at Mon Aug 24 00:45:00 2009
driver: debug 1 pid 12625 ruid 555 euid 555: rename at Mon Aug 24 00:45:01 2009
driver: time 0.017: Did not get DATE line from planner
driver: time 0.017: pid 12625 finish time Mon Aug 24 00:45:01 2009
eclogite:/tmp/amanda/server/daily:root# date
And there were no debug files on either of the clients for that
date/time. The cronlog entry shows a return code of 8:
CMD: /usr/local/sbin/amdump daily -o reserve=100 -o tapedev="" -o tpchanger=""
amandaba 12613 c Mon Aug 24 00:45:00 2009
< amandaba 12613 c Mon Aug 24 00:45:02 2009 rc=8
Paradoxically, when I ran an amcheck this morning to see if I could get
more detail, amcheck reported no problems. I've looked over everything I
can think of. Since the critical event here is a server reboot, I
particularly looked for things that might have been affected by that --
supposing, for example, I had set something up by hand, and on a reboot
it didn't come back properly. But, amanda isn't susceptible to too much
in that respect, since it runs off cron and isn't dependent on running
daemons, etc. Various mount points and permissions seemed to be alright,
/tmp/amanda was getting debug logs.
I suppose I could have tweaked the amanda entries for inetd incorrectly
and forgot to HUP inetd. Then the reboot would cause it to fail. But the
entries present match my other amanda servers. /etc/inetd.conf has the
following entry:
amanda dgram udp wait amandabackup /usr/local/libexec/amandad amandad -auth=bsd amdump amindexd amidxtaped
and /etc/services has these entries:
amanda 10080/udp
amandaidx 10082/tcp
amidxtape 10083/tcp
And, amcheck reports no problems right now (and in fact produces debug
logs on each of the clients in /tmp/amanda). So, to repeat myself --
this setup has been running smoothly for a couple of weeks. The reboot
borked it. I'm stuck. Might it just work tonight? I kind of doubt it.
Though something may have gotten poked by the amdump that failed and/or
by the amcheck that succeeded.
Here's the output from the amcheck:
eclogite:/usr/local/etc/amanda/daily$ amcheck daily
Amanda Tape Server Host Check
-----------------------------
Holding disk /var/spool/amanda/disk2: 3631598 KB disk space available, using
2095598 KB
Holding disk /var/spool/amanda/disk1: 3631598 KB disk space available, using
2095598 KB
Holding disk /var/spool/amanda/disk3: 234427600 KB disk space available, using
223941840 KB
slot 5: read label `geo-daily-05', date `20090801204500'
NOTE: skipping tape-writable test
Tape geo-daily-05 label ok
Server check took 106.699 seconds
Amanda Backup Client Hosts Check
--------------------------------
Client check: 3 hosts checked in 1.197 seconds, 0 problems found
(brought to you by Amanda 2.5.1p3)
Yeah, I know, the holding disk setup is weird. That's what the new drive
is for.
And yup, I've reviewed all my own notes, looked over the troubleshooting
on the wiki, and perused all the list archives and forum archives that
google brought up when I searched on the error message "Did not get DATE
line from planner". I'll sleep on it too, but that'll take another day.
;-) Just thought one of the sharper people on the list might see what
I'm missing.
--
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogen...@bio.umass.edu>
---------------
Erdös 4