On Wednesday 26 February 2014 10:39:13 Gene Heskett did opine: > Greetings; > > 3 backups ago, with no change to the amanda.conf in months, I have > awakened to a hung tar task using 100% of a core, more than 5 hours > after it should have completed. > > It is in that state now. How can I find what is causing this blockage? > Here is the report from yesterdays attempt, received after I had used > htop to send this stuck tar instance a normal quit signal. > > These dumps were to tape Dailys-9. > The next 2 tapes Amanda expects to use are: Dailys-10, Dailys-11. > FAILURE DUMP SUMMARY: > planner: ERROR Some estimate timeout on coyote, using server estimate > if possible coyote /CoCo lev 0 FAILED [too many dumper retry: [request > failed: Connection timed out]] coyote /GenesAmandaHelper-0.61 lev 1 > FAILED [too many dumper retry: [request failed: Connection timed out]] > coyote /home lev 2 FAILED [too many dumper retry: [request failed: > Connection timed out]] coyote /lib lev 0 FAILED [disk /lib, all > estimate timed out] coyote /opt lev 0 FAILED [disk /opt, all estimate > timed out] coyote /root lev 0 FAILED [disk /root, all estimate timed > out] coyote /sbin lev 0 FAILED [disk /sbin, all estimate timed out] > coyote /var lev 0 FAILED [disk /var, all estimate timed out] coyote > /usr/bin lev 0 FAILED [disk /usr/bin, all estimate timed out] coyote > /usr/dlds/misc lev 0 FAILED [disk /usr/dlds/misc, all estimate timed > out] coyote /usr/dlds/tgzs lev 0 FAILED [disk /usr/dlds/tgzs, all > estimate timed out] coyote /usr/dlds/books lev 0 FAILED [disk > /usr/dlds/books, all estimate timed out] coyote /usr/include lev 0 > FAILED [disk /usr/include, all estimate timed out] coyote /usr/lib lev > 0 FAILED [disk /usr/lib, all estimate timed out] coyote /usr/libexec > lev 0 FAILED [disk /usr/libexec, all estimate timed out] coyote > /usr/movies lev 0 FAILED [disk /usr/movies, all estimate timed out] > coyote /usr/local lev 0 FAILED [disk /usr/local, all estimate timed > out] coyote /usr/music lev 0 FAILED [disk /usr/music, all estimate > timed out] coyote /usr/pix lev 0 FAILED [disk /usr/pix, all estimate > timed out] coyote /usr/sbin lev 0 FAILED [disk /usr/sbin, all estimate > timed out] coyote /usr/share lev 0 FAILED [disk /usr/share, all > estimate timed out] coyote /usr/src lev 0 FAILED [disk /usr/src, all > estimate timed out] coyote /usr/games lev 0 FAILED [disk /usr/games, > all estimate timed out] coyote /CoCo lev 0 FAILED Got empty header > coyote /CoCo lev 0 FAILED Got empty header > coyote /GenesAmandaHelper-0.61 lev 1 FAILED Got empty header > coyote /GenesAmandaHelper-0.61 lev 1 FAILED Got empty header > coyote /boot lev 0 FAILED Got empty header > coyote /home lev 2 FAILED Got empty header > coyote /home lev 2 FAILED Got empty header > > However, at the bottom of the report, the remote systems were backed up > just fine. lathe /home 1 1 0 5.6 > 0:00 169.9 0:36 1.2 lathe /usr/lib/amanda 1 > 0 0 3.3 0:05 0.4 0:00 10.0 lathe /usr/local > 1 0 0 2.0 0:05 0.4 0:00 10.0 lathe > /var/lib/amanda 1 0 0 22.0 0:00 354.6 0:00 > 220.0 shop /home 3 4 0 8.2 > 0:07 43.6 0:00 3080.0 shop /usr/lib/amanda 1 > 0 0 3.3 0:05 0.4 0:00 10.0 shop /usr/local > 1 0 0 2.0 0:05 0.4 0:00 10.0 shop > /var/lib/amanda 1 2 0 17.8 0:01 584.4 0:00 > 2950.0 > > (brought to you by Amanda version 4.0.0alpha.svn.4761) > > Now, the thing that _has_ changed is the running kernel, from a 3.12.9 > that seemed to work well with amanda, to a 3.13.5 that I had one heck > of a time building because of Kconfig dependency errors that caused all > of the many "media" options to disappear from the "make ?config" > operations, and it is likely this one could be missing something that > tar needs. > > So, what, from this, would be the most likely candidate? The config.gz > is attached. > > Thank you very much for any insight that can be determined from this. > > Cheers, Gene
Ping! In the meantime I have rebuilt this kernel 3 times, getting an unbootable once, but without finding the option that seems to throw tar for a forever loop. FWIW, when tar is in that state, the only drive activity is related to fetchmail activities which loops every 3 minutes, tar apparently gets stuck hammering on something it can't access. And yet, the DLE it appears to be stuck on while attempting an estimate, /lib, can be listed with an ls -laR, with no problems. This is the distro's copy of tar-1.22, but I've no clue what options it was compiled with. Is this 1.22 a known bad actor under some conditions? Cheers, Gene -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene> NOTICE: Will pay 100 USD for an HP-4815A defective but complete probe assembly.
