I just had a vaguely similar issue on my network.  My Amanda server stopped 
processing, and everything failed with 'RESULTS MISSING'.  Nothing I tested on 
the Amanda server worked, and I had several zombie amandad processes stuck in 
it.  I rebooted the Amanda server and got to checking the clients... one client 
in particular caused the whole problem.  That client, call him A, was using two 
NFS mounts from another client called 'B'.  Now, Client A, managed by another 
group,  had supposedly stopped using those NFS mounts, so when I was told to 
decommission client B, I did so.  This left Client A's NFS mounts from Client B 
in a wedged state, as NFS always does in this situation.  What bit me and 
Amanda is that we were using Client A's mounts to backup that content from 
client B, so when my server  tried to talk to A's Amandad, both of them locked 
up.  ( to really complicate matters, those two NFS shares were themselves iscsi 
mounts to  Client B from a SAN.)

We had to edit /etc/fstab on A, delete the bad mounts from decommissioned 
client B, reboot, then reboot A.

Now my backups are working again.

Some days, NFS just isn't worth using.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Chris Hoogendyk
Sent: Wednesday, January 21, 2015 15:40
To: Jean-Louis Martineau
Cc: AMANDA users
Subject: Re: amcheck 0 problems, but timeout on reply pipe

If some of the DLEs were fully estimated in short time, would those also fail 
just because other DLEs on the same host caused long time delays?

I just find it odd that things were working smoothly up to the 16th and then 
consistently and completely failing after the 16th.

I'd be perfectly happy to extend the timeout if that were the issue.


On 1/21/15 4:04 PM, Jean-Louis Martineau wrote:
> The estimate took more than 6 hours!
> After how much time do you get the timeout error?
> What is your etimeout setting?
>
> You could tried faster estimate method: calcsize or server.
>
> Jean-Louis
>
> On 01/21/2015 03:52 PM, Chris Hoogendyk wrote:
>> First note that regardless of the timing for any particular DLE, 
>> *every* single DLE for this one host is failing, while all other hosts are 
>> getting fully backed up without any trouble.
>>
>> On the host that is failing, there are no senbackup debug files since it 
>> started failing.
>>
>> On that same host, the sendsize debug file from last night includes (this is 
>> just a segment):
>>
>> "/tmp/amanda/client/daily/sendsize.20150120233005.debug" 1165 lines, 
>> 145734 characters Tue Jan 20 23:30:05 2015: thd-32a58: sendsize: pid 
>> 11633 ruid 555 euid 555 version 3.3.2: start at Tue Jan 20 23:30:05 
>> 2015 Tue Jan 20 23:30:05 2015: thd-32a58: sendsize: version 3.3.2 Tue 
>> Jan 20 23:30:05 2015: thd-32a58: sendsize: pid 11633 ruid 555 euid 
>> 555 version 3.3.2: rename at Tue Jan 20 23:30:05 2015 Tue Jan 20 
>> 23:30:08 2015: thd-32a58: sendsize: waiting for any estimate child: 2 
>> running Tue Jan 20 23:30:08 2015: thd-32a58: sendsize: calculating 
>> for amname /, dirname /, spindle 100 DUMP Tue Jan 20 23:30:08 2015: 
>> thd-32a58: sendsize: getting size via dump for / level 0 Tue Jan 20 
>> 23:30:08 2015: thd-32a58: sendsize: calculating for amname 
>> /export/baja, dirname /export/baja, spindle 45010 GNUTAR Tue Jan 20 
>> 23:30:08 2015: thd-32a58: sendsize: getting size via gnutar for 
>> /export/baja level 0 Tue Jan 20 23:30:08 2015: thd-32a58: sendsize: 
>> calculating for device /dev/rdsk/c1t0d0s0 with ufs Tue Jan 20 
>> 23:30:08 2015: thd-32a58: sendsize: running 
>> "/usr/local/etc/amanda/tools/ufsdump 0Ssf
>> 1048576 - /dev/rdsk/c1t0d0s0"
>> Tue Jan 20 23:30:08 2015: thd-32a58: sendsize: running 
>> /usr/local/libexec/amanda/killpgrp
>> Tue Jan 20 23:30:08 2015: thd-32a58: sendsize: Spawning 
>> "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /export/baja --one-file-system --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_export_baja_0.n
>> ew --sparse --ignore-failed-read --totals ." in pipeline Tue Jan 20 
>> 23:30:19 2015: thd-32a58: sendsize: 13847108608 Tue Jan 20 23:30:19 
>> 2015: thd-32a58: sendsize: .....
>> Tue Jan 20 23:30:19 2015: thd-32a58: sendsize: estimate time for / 
>> level 0: 11.540 Tue Jan 20 23:30:19 2015: thd-32a58: sendsize: 
>> estimate size for / level 0: 13522567 KB Tue Jan 20 23:30:19 2015: 
>> thd-32a58: sendsize: asking killpgrp to terminate Tue Jan 20 23:30:20 
>> 2015: thd-32a58: sendsize: getting size via dump for / level 1 Tue 
>> Jan 20 23:30:20 2015: thd-32a58: sendsize: calculating for device 
>> /dev/rdsk/c1t0d0s0 with ufs Tue Jan 20 23:30:20 2015: thd-32a58: 
>> sendsize: running "/usr/local/etc/amanda/tools/ufsdump 1Ssf
>> 1048576 - /dev/rdsk/c1t0d0s0"
>> Tue Jan 20 23:30:20 2015: thd-32a58: sendsize: running 
>> /usr/local/libexec/amanda/killpgrp
>> Tue Jan 20 23:32:33 2015: thd-32a58: sendsize: 1461860352 Tue Jan 20 
>> 23:32:33 2015: thd-32a58: sendsize: .....
>> Tue Jan 20 23:32:33 2015: thd-32a58: sendsize: estimate time for / 
>> level 1: 133.065 Tue Jan 20 23:32:33 2015: thd-32a58: sendsize: 
>> estimate size for / level 1: 1427598 KB Tue Jan 20 23:32:33 2015: 
>> thd-32a58: sendsize: asking killpgrp to terminate Tue Jan 20 23:32:34 
>> 2015: thd-32a58: sendsize: done with amname / dirname / spindle 100 
>> Tue Jan 20 23:32:34 2015: thd-32a58: sendsize: waiting for any 
>> estimate child: 2 running Tue Jan 20 23:32:34 2015: thd-32a58: 
>> sendsize: calculating for amname /archive, dirname /archive, spindle 
>> 100 GNUTAR Tue Jan 20 23:32:34 2015: thd-32a58: sendsize: getting 
>> size via gnutar for /archive level 0 Tue Jan 20 23:32:34 2015: 
>> thd-32a58: sendsize: Spawning "/usr/local/libexec/amanda/runtar 
>> runtar daily /usr/local/etc/amanda/tools/gtar --create --file 
>> /dev/null --numeric- owner --directory /archive --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_archive_0.new 
>> --sparse --ignore-failed-read --totals  ." in pipeline Tue Jan 20 23:32:58 
>> 2015: thd-32a58: sendsize: Total bytes written: 10917795840 (11GiB, 63MiB/s) 
>> Tue Jan 20 23:32:58 2015: thd-32a58: sendsize: .....
>> Tue Jan 20 23:32:58 2015: thd-32a58: sendsize: estimate time for 
>> /export/baja level 0: 169.954 Tue Jan 20 23:32:58 2015: thd-32a58: 
>> sendsize: estimate size for /export/baja level 0: 10661910 KB Tue Jan 
>> 20 23:32:58 2015: thd-32a58: sendsize: waiting for runtar 
>> "/export/baja" child Tue Jan 20 23:32:58 2015: thd-32a58: sendsize: 
>> after runtar /export/baja wait Tue Jan 20 23:32:58 2015: thd-32a58: 
>> sendsize: getting size via gnutar for /export/baja level 1 Tue Jan 20 
>> 23:32:58 2015: thd-32a58: sendsize: Spawning 
>> "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /export/baja --one-file-system --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_export_baja_1.n
>> ew --sparse --ignore-failed-read --totals ." in pipeline Tue Jan 20 23:33:01 
>> 2015: thd-32a58: sendsize: Total bytes written: 133120 (130KiB, 39KiB/s) Tue 
>> Jan 20 23:33:01 2015: thd-32a58: sendsize: .....
>> Tue Jan 20 23:33:01 2015: thd-32a58: sendsize: estimate time for 
>> /export/baja level 1: 3.525 Tue Jan 20 23:33:01 2015: thd-32a58: 
>> sendsize: estimate size for /export/baja level 1: 130 KB Tue Jan 20 
>> 23:33:01 2015: thd-32a58: sendsize: waiting for runtar "/export/baja" 
>> child Tue Jan 20 23:33:01 2015: thd-32a58: sendsize: after runtar 
>> /export/baja wait Tue Jan 20 23:33:01 2015: thd-32a58: sendsize: done 
>> with amname /export/baja dirname /export/baja spindle 45010 Tue Jan 
>> 20 23:33:01 2015: thd-32a58: sendsize: waiting for any estimate 
>> child: 2 running Tue Jan 20 23:33:01 2015: thd-32a58: sendsize: 
>> calculating for amname /export/barbados, dirname /export/barbados, 
>> spindle 45010 GNUTAR Tue Jan 20 23:33:01 2015: thd-32a58: sendsize: 
>> getting size via gnutar for /export/barbados level 0 Tue Jan 20 
>> 23:33:01 2015: thd-32a58: sendsize: Spawning 
>> "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /export/barbados --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_export_barbados_0.new 
>> --sparse --ignore-fail ed-read --totals ." in pipeline Tue Jan 20 23:36:44 
>> 2015: thd-32a58: sendsize: Total bytes written: 11691386880 (11GiB, 51MiB/s) 
>> Tue Jan 20 23:36:44 2015: thd-32a58: sendsize: .....
>> Tue Jan 20 23:36:44 2015: thd-32a58: sendsize: estimate time for 
>> /export/barbados level 0: 222.591 Tue Jan 20 23:36:44 2015: thd-32a58: 
>> sendsize: estimate size for /export/barbados level 0:
>> 11417370 KB
>> Tue Jan 20 23:36:44 2015: thd-32a58: sendsize: waiting for runtar 
>> "/export/barbados" child Tue Jan 20 23:36:44 2015: thd-32a58: 
>> sendsize: after runtar /export/barbados wait Tue Jan 20 23:36:44 
>> 2015: thd-32a58: sendsize: getting size via gnutar for 
>> /export/barbados level 1 Tue Jan 20 23:36:44 2015: thd-32a58: 
>> sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /export/barbados --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_export_barbados_1.new 
>> --sparse --ignore-fail ed-read --totals ." in pipeline Tue Jan 20 23:36:45 
>> 2015: thd-32a58: sendsize: Total bytes written: 153600 (150KiB, 125KiB/s) 
>> Tue Jan 20 23:36:45 2015: thd-32a58: sendsize: .....
>> Tue Jan 20 23:36:45 2015: thd-32a58: sendsize: estimate time for 
>> /export/barbados level 1: 1.378 Tue Jan 20 23:36:45 2015: thd-32a58: 
>> sendsize: estimate size for /export/barbados level 1: 150 KB Tue Jan 
>> 20 23:36:45 2015: thd-32a58: sendsize: waiting for runtar 
>> "/export/barbados" child Tue Jan 20 23:36:45 2015: thd-32a58: 
>> sendsize: after runtar /export/barbados wait Tue Jan 20 23:36:45 
>> 2015: thd-32a58: sendsize: done with amname /export/barbados dirname 
>> /export/barbados spindle 45010 Tue Jan 20 23:36:45 2015: thd-32a58: 
>> sendsize: waiting for any estimate child: 2 running Tue Jan 20 
>> 23:36:45 2015: thd-32a58: sendsize: calculating for amname 
>> /export/bermuda, dirname /export/bermuda, spindle 45010 GNUTAR Tue 
>> Jan 20 23:36:45 2015: thd-32a58: sendsize: getting size via gnutar 
>> for /export/bermuda level 0 Tue Jan 20 23:36:45 2015: thd-32a58: 
>> sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /export/bermuda --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_export_bermuda_
>> 0.new --sparse --ignore-failed -read --totals ." in pipeline
>>
>> SKIPPING A THOUSAND OR SO LINES . . . .
>>
>> Wed Jan 21 04:40:50 2015: thd-32a58: sendsize: .....
>> Wed Jan 21 04:40:50 2015: thd-32a58: sendsize: estimate time for 
>> /u1/home/micro/./k level 0: 
>> 3245.840
>> Wed Jan 21 04:40:50 2015: thd-32a58: sendsize: estimate size for 
>> /u1/home/micro/./k level 0: 
>> 45555590 KB
>> Wed Jan 21 04:40:50 2015: thd-32a58: sendsize: waiting for runtar 
>> "/u1/home/micro/./k" child Wed Jan 21 04:40:50 2015: thd-32a58: 
>> sendsize: after runtar /u1/home/micro/./k wait Wed Jan 21 04:40:50 
>> 2015: thd-32a58: sendsize: getting size via gnutar for 
>> /u1/home/micro/./k level 1 Wed Jan 21 04:40:51 2015: thd-32a58: 
>> sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /u1/home/micro --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_u1_home_micro_.
>> _k_1.new --sparse --ignore-fail ed-read --totals --files-from 
>> /tmp/amanda/sendsize._u1_home_micro_._k.20150121044050.include" in pipeline 
>> Wed Jan 21 04:51:31 2015: thd-32a58: sendsize: Total bytes written: 
>> 6410967040 (6.0GiB, 9.6MiB/s) Wed Jan 21 04:51:31 2015: thd-32a58: sendsize: 
>> .....
>> Wed Jan 21 04:51:31 2015: thd-32a58: sendsize: estimate time for 
>> /u1/home/micro/./k level 1: 640.501 Wed Jan 21 04:51:31 2015: thd-32a58: 
>> sendsize: estimate size for /u1/home/micro/./k level 1:
>> 6260710 KB
>> Wed Jan 21 04:51:31 2015: thd-32a58: sendsize: waiting for runtar 
>> "/u1/home/micro/./k" child Wed Jan 21 04:51:31 2015: thd-32a58: 
>> sendsize: after runtar /u1/home/micro/./k wait Wed Jan 21 04:51:31 
>> 2015: thd-32a58: sendsize: done with amname /u1/home/micro/./k 
>> dirname /u1/home/micro spindle 506 Wed Jan 21 04:51:31 2015: 
>> thd-32a58: sendsize: waiting for any estimate child: 1 running Wed 
>> Jan 21 04:51:31 2015: thd-32a58: sendsize: calculating for amname 
>> /u1/home/micro/./l-z, dirname /u1/home/micro, spindle 506 GNUTAR Wed 
>> Jan 21 04:51:31 2015: thd-32a58: sendsize: getting size via gnutar 
>> for /u1/home/micro/./l-z level 0 Wed Jan 21 04:51:31 2015: thd-32a58: 
>> sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /u1/home/micro --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_u1_home_micro_._l-z_0.new
>>  --sparse --ignore-fa iled-read --totals --files-from 
>> /tmp/amanda/sendsize._u1_home_micro_._l-z.20150121045131.include"
>> in pipeline
>> Wed Jan 21 05:16:06 2015: thd-32a58: sendsize: Total bytes written: 
>> 18660341760 (18GiB, 13MiB/s) Wed Jan 21 05:16:06 2015: thd-32a58: sendsize: 
>> .....
>> Wed Jan 21 05:16:06 2015: thd-32a58: sendsize: estimate time for 
>> /u1/home/micro/./l-z level 0: 
>> 1475.354
>> Wed Jan 21 05:16:06 2015: thd-32a58: sendsize: estimate size for 
>> /u1/home/micro/./l-z level 0: 
>> 18222990 KB
>> Wed Jan 21 05:16:06 2015: thd-32a58: sendsize: waiting for runtar 
>> "/u1/home/micro/./l-z" child Wed Jan 21 05:16:06 2015: thd-32a58: 
>> sendsize: after runtar /u1/home/micro/./l-z wait Wed Jan 21 05:16:06 
>> 2015: thd-32a58: sendsize: getting size via gnutar for 
>> /u1/home/micro/./l-z level 1 Wed Jan 21 05:16:06 2015: thd-32a58: 
>> sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /u1/home/micro --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_u1_home_micro_._l-z_1.new
>>  --sparse --ignore-fa iled-read --totals --files-from 
>> /tmp/amanda/sendsize._u1_home_micro_._l-z.20150121051606.include"
>> in pipeline
>> Wed Jan 21 05:29:09 2015: thd-32a58: sendsize: Total bytes written: 
>> 8945582080 (8.4GiB, 11MiB/s) Wed Jan 21 05:29:09 2015: thd-32a58: sendsize: 
>> .....
>> Wed Jan 21 05:29:09 2015: thd-32a58: sendsize: estimate time for 
>> /u1/home/micro/./l-z level 1: 
>> 782.213
>> Wed Jan 21 05:29:09 2015: thd-32a58: sendsize: estimate size for 
>> /u1/home/micro/./l-z level 1: 
>> 8735920 KB
>> Wed Jan 21 05:29:09 2015: thd-32a58: sendsize: waiting for runtar 
>> "/u1/home/micro/./l-z" child Wed Jan 21 05:29:09 2015: thd-32a58: 
>> sendsize: after runtar /u1/home/micro/./l-z wait Wed Jan 21 05:29:09 
>> 2015: thd-32a58: sendsize: getting size via gnutar for 
>> /u1/home/micro/./l-z level 2 Wed Jan 21 05:29:09 2015: thd-32a58: 
>> sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily 
>> /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric- 
>> owner --directory /u1/home/micro --one-file-system 
>> --listed-incremental 
>> /usr/local/var/amanda/gnutar-lists/marlin.bio.mor.nsm_u1_home_micro_._l-z_2.new
>>  --sparse --ignore-fa iled-read --totals --files-from 
>> /tmp/amanda/sendsize._u1_home_micro_._l-z.20150121052909.include"
>> in pipeline
>> Wed Jan 21 05:42:15 2015: thd-32a58: sendsize: Total bytes written: 
>> 8937635840 (8.4GiB, 11MiB/s) Wed Jan 21 05:42:15 2015: thd-32a58: sendsize: 
>> .....
>> Wed Jan 21 05:42:15 2015: thd-32a58: sendsize: estimate time for 
>> /u1/home/micro/./l-z level 2: 
>> 786.054
>> Wed Jan 21 05:42:15 2015: thd-32a58: sendsize: estimate size for 
>> /u1/home/micro/./l-z level 2: 
>> 8728160 KB
>> Wed Jan 21 05:42:15 2015: thd-32a58: sendsize: waiting for runtar 
>> "/u1/home/micro/./l-z" child Wed Jan 21 05:42:15 2015: thd-32a58: 
>> sendsize: after runtar /u1/home/micro/./l-z wait Wed Jan 21 05:42:15 
>> 2015: thd-32a58: sendsize: done with amname /u1/home/micro/./l-z 
>> dirname /u1/home/micro spindle 506
>>
>> AND THAT IS THE END OF THE SENDSIZE DEBUG FILE
>>
>>
>>
>> On 1/21/15 3:02 PM, Jean-Louis Martineau wrote:
>>> You get error at estimate or at backup time?
>>> Look at the time in the sendsize and sendbackup debug file to find which 
>>> one is slow.
>>>
>>> On 01/21/2015 02:40 PM, Chris Hoogendyk wrote:
>>>> Folks,
>>>>
>>>> I have an Ubuntu 14.04 LTS system running Amanda 3.3.6 server 
>>>> backing up a Solaris 10 system with Amanda 3.3.2.
>>>>
>>>> I had it working and I was getting backups of that particular 
>>>> Solaris system. Then I suddenly started getting the "timeout on 
>>>> reply pipe" on every single dle on that system, but not on any 
>>>> other systems. There is also another virtually identical Solaris 
>>>> system (except with Amanda
>>>> 2.5.1p3) that has continued getting backed up as well as a number 
>>>> of Ubuntu systems with various versions of Ubuntu (10.04LTS, 
>>>> 12.04LTS, or 14.04LTS) and Amanda (either 2.5.1p3, 3.3.2, or 3.3.6).
>>>>
>>>> If I run `amcheck -c daily`, I get 0 problems.
>>>>
>>>> How do I troubleshoot this? Why would it have suddenly come up 
>>>> (last Friday) and then been consistently non functional? (Whereas 
>>>> before it was consistently functional). I've poked through the 
>>>> /tmp/amanda debug logs, but haven't been able to identify any errors that 
>>>> would tell me what was wrong.
>>>>
>>>> I should note that most of these servers are in the same two 
>>>> adjacent racks and have GigE connections to the same switch.
>>>>
>>>> The server that is not getting backed up at present is our main 
>>>> departmental server that is running mail services, web, file 
>>>> shares, printing, anonymous ftp, mysql, etc. for a fairly active 
>>>> department.
>>>>
>>>>
>>>
>>
>

--
---------------

Chris Hoogendyk

-
    O__  ---- Systems Administrator
   c/ /'_ --- Biology & Geology Departments
  (*) \(*) -- 347 Morrill Science Center ~~~~~~~~~~ - University of 
Massachusetts, Amherst

<[email protected]>

---------------

Erdös 4


Reply via email to