On Thu, Jul 13, 2017 at 04:07:19PM +0100, Jose M Calhariz wrote:
> 
> Hi,
> 
> I have another installation of amanda.  This one is very big, 120
> hosts, 750 DLE and using ssh authentication.  Anyone using an
> instlation of this size?
> 
> My problem is that this setup works for some days without problems and
> other days it can not backup all the servers.
> 
> I have investigated the clients.  When there is problems, not one tried
> to contact the faulty clients, no traffic is being generated from the
> server to the client.
> 
> Now I am trying to make sense from the server logs.  Looking into the
> planner logs I see messages from the sucessfully servers but not the
> name of the faulty servers.  Looking into /var/log/amanda/Daily I see
> messages in the log and amdump of requesting estimates and in the same
> second saying:
> 
> 
> amdump.20170713000603:planner: time 0.055: setting up estimates for 
> hostanme.domain.name:/
> amdump.20170713000603:setup_estimate: hostanme.domain.name:/: command 0, 
> options: none    last_level 1 next_level0 5 level_days 6    getting estimates 
> 0 (-3) 1 (-3) 2 (-3)
> amdump.20170713000603:planner: time 0.055: setting up estimates for 
> hostanme.domain.name:/boot
> amdump.20170713000603:setup_estimate: hostanme.domain.name:/boot: command 0, 
> options: none    last_level 1 next_level0 4 level_days 7    getting estimates 
> 0 (-3) 1 (-3) -1 (-3)
> amdump.20170713000603:planner: FAILED hostanme.domain.name / 20170713000603 0 
> "[hmm, no error indicator!]"
> amdump.20170713000603:planner: FAILED hostanme.domain.name /boot 
> 20170713000603 0 "[hmm, no error indicator!]"
> 
> I am out of ideas about things to do to find a possible reason for the
> failure.  Can anyone help me?

If I comment entries in disklist the list of failed machines changes.


To get an idea how big is this installation here is:
amstatus Daily --summary
Using /var/log/amanda/Daily/amdump.1
>From Sun Jul 16 07:45:03 BST 2017


SUMMARY          part      real  estimated
                           size       size
partition       : 967
estimated       : 166                  578g
flush           : 240      2711g
failed          : 561                    0g           (  0.00%)
wait for dumping:   0                    0g           (  0.00%)
dumping to tape :   0                    0sunit           (  0.00%)
dumping         :   0         0g         0g (  0.00%) (  0.00%)
dumped          : 166       539g       578g ( 93.13%) ( 93.13%)
wait for writing: 166       539g       578g ( 93.13%) ( 93.13%)
wait to flush   :  62       269g       269g (100.00%) (  0.00%)
writing to tape :   0         0g         0g (  0.00%) (  0.00%)
failed to tape  :   0         0g         0g (  0.00%) (  0.00%)
taped           : 178      2441g      2441g (100.00%) ( 74.21%)
  tape 1        : 178      2441g      2441g (100.00%) Daily-39 (178 chunks)
16 dumpers idle : 0
taper 0 status: Idle
taper qlen: 229
network free kps:  10000000
holding space   :     10191g (122.95%)




> 
> Kind regards
> Jose M Calhariz
> 
>


Now I am looking into the code in planner.c to see if I can insert
debug code and understand why is failing to launch some ssh commands.

Kind regards
Jose M Calhariz

-- 
--
        A verdade e a melhor camuflagem. Ninguem acredita nela.
                --  Max Frisch; escritor suico.

Reply via email to