failure of backups after reboot

Chris Hoogendyk Mon, 24 Aug 2009 11:48:38 -0700

This is a recently configured Amanda 2.5.1p3 setup on a group of E250'srunning Solaris 9. It has been running smoothly for a few weeks now.Just before building Amanda, I had done a reconfigure reboot to getdrivers running for a new tape library. This weekend, after the weekendbackups had completed successfully and everything had flushed out totape, I did a `touch /reconfigure; reboot` to get a new drive recognized(should have been a hot swappable drive, but seagate has changed theirfirmware so it doesn't work the way it once did). Anyway, all theservices came up alright as far as I could tell. This is actually adepartmental server with sendmail, apache, samba, etc. all running on it(they couldn't afford the parts to set up a dedicated backup server).


When I looked at the backup reports this morning, this server had:


*** THE DUMPS DID NOT FINISH PROPERLY!

The next tape Amanda expects to use is: geo-daily-05.
The next new tape already labelled is: geo-daily-03.

FAILURE AND STRANGE DUMP SUMMARY:
 eclogite.geo.umass.edu  /                       RESULTS MISSING
    ...<snip>...
 eclogite.geo.umass.edu  /var/mail               RESULTS MISSING
 gis.geo.umass.edu       /                       RESULTS MISSING
    ...<snip>...
 gis.geo.umass.edu       /usr/local              RESULTS MISSING
 mogollon.geo.umass.edu  /                       RESULTS MISSING
    ...<snip>...
 mogollon.geo.umass.edu  /usr/local              RESULTS MISSING
 driver: FATAL Did not get DATE line from planner


The debug files were sparce:

eclogite:/:root# cd /tmp/amanda
eclogite:/tmp/amanda:root# ls -l
total 16
drwx------   3 amandabackup amandabackup     179 Aug 24 00:45 server
eclogite:/tmp/amanda:root# cd server
eclogite:/tmp/amanda/server:root# ls -l
total 16
drwx------   2 amandabackup amandabackup     546 Aug 24 00:45 daily
eclogite:/tmp/amanda/server:root# cd daily
eclogite:/tmp/amanda/server/daily:root# ls -l
total 80
-rw-r-----   1 amandabackup amandabackup     223 Aug 24 00:45 
amlogroll.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup     220 Aug 24 00:45 
amreport.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup    3875 Aug 24 00:45 
amtrmidx.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup     286 Aug 24 00:45 
amtrmlog.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup     281 Aug 24 00:45 
driver.20090824004500.debug

eclogite:/tmp/amanda/server/daily:root# more driver.20090824004500.debugdriver: debug 1 pid 12625 ruid 555 euid 555: start at Mon Aug 24 00:45:00 2009

driver: debug 1 pid 12625 ruid 555 euid 555: rename at Mon Aug 24 00:45:01 2009
driver: time 0.017: Did not get DATE line from planner
driver: time 0.017: pid 12625 finish time Mon Aug 24 00:45:01 2009
eclogite:/tmp/amanda/server/daily:root# date

And there were no debug files on either of the clients for thatdate/time. The cronlog entry shows a return code of 8:

 CMD: /usr/local/sbin/amdump daily -o reserve=100 -o tapedev="" -o tpchanger=""
 amandaba 12613 c Mon Aug 24 00:45:00 2009

<  amandaba 12613 c Mon Aug 24 00:45:02 2009 rc=8

Paradoxically, when I ran an amcheck this morning to see if I could getmore detail, amcheck reported no problems. I've looked over everything Ican think of. Since the critical event here is a server reboot, Iparticularly looked for things that might have been affected by that --supposing, for example, I had set something up by hand, and on a rebootit didn't come back properly. But, amanda isn't susceptible to too muchin that respect, since it runs off cron and isn't dependent on runningdaemons, etc. Various mount points and permissions seemed to be alright,/tmp/amanda was getting debug logs.

I suppose I could have tweaked the amanda entries for inetd incorrectlyand forgot to HUP inetd. Then the reboot would cause it to fail. But theentries present match my other amanda servers. /etc/inetd.conf has thefollowing entry:

amanda dgram udp wait amandabackup /usr/local/libexec/amandad amandad -auth=bsd amdump amindexd amidxtaped


and /etc/services has these entries:

amanda          10080/udp
amandaidx       10082/tcp
amidxtape       10083/tcp

And, amcheck reports no problems right now (and in fact produces debuglogs on each of the clients in /tmp/amanda). So, to repeat myself --this setup has been running smoothly for a couple of weeks. The rebootborked it. I'm stuck. Might it just work tonight? I kind of doubt it.Though something may have gotten poked by the amdump that failed and/orby the amcheck that succeeded.


Here's the output from the amcheck:

eclogite:/usr/local/etc/amanda/daily$ amcheck daily
Amanda Tape Server Host Check
-----------------------------
Holding disk /var/spool/amanda/disk2: 3631598 KB disk space available, using 
2095598 KB
Holding disk /var/spool/amanda/disk1: 3631598 KB disk space available, using 
2095598 KB
Holding disk /var/spool/amanda/disk3: 234427600 KB disk space available, using 
223941840 KB
slot 5: read label `geo-daily-05', date `20090801204500'
NOTE: skipping tape-writable test
Tape geo-daily-05 label ok
Server check took 106.699 seconds

Amanda Backup Client Hosts Check
--------------------------------
Client check: 3 hosts checked in 1.197 seconds, 0 problems found

(brought to you by Amanda 2.5.1p3)

Yeah, I know, the holding disk setup is weird. That's what the new driveis for.

And yup, I've reviewed all my own notes, looked over the troubleshootingon the wiki, and perused all the list archives and forum archives thatgoogle brought up when I searched on the error message "Did not get DATEline from planner". I'll sleep on it too, but that'll take another day.;-) Just thought one of the sharper people on the list might see whatI'm missing.



--
---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center

~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogen...@bio.umass.edu>

---------------

Erdös 4

failure of backups after reboot

Reply via email to