On Thu, Jul 13, 2017 at 04:07:19PM +0100, Jose M Calhariz wrote:
>
> Hi,
>
> I have another installation of amanda. This one is very big, 120
> hosts, 750 DLE and using ssh authentication. Anyone using an
> instlation of this size?
>
> My problem is that this setup works for some days without problems and
> other days it can not backup all the servers.
>
> I have investigated the clients. When there is problems, not one tried
> to contact the faulty clients, no traffic is being generated from the
> server to the client.
>
> Now I am trying to make sense from the server logs. Looking into the
> planner logs I see messages from the sucessfully servers but not the
> name of the faulty servers. Looking into /var/log/amanda/Daily I see
> messages in the log and amdump of requesting estimates and in the same
> second saying:
>
>
> amdump.20170713000603:planner: time 0.055: setting up estimates for
> hostanme.domain.name:/
> amdump.20170713000603:setup_estimate: hostanme.domain.name:/: command 0,
> options: none last_level 1 next_level0 5 level_days 6 getting estimates
> 0 (-3) 1 (-3) 2 (-3)
> amdump.20170713000603:planner: time 0.055: setting up estimates for
> hostanme.domain.name:/boot
> amdump.20170713000603:setup_estimate: hostanme.domain.name:/boot: command 0,
> options: none last_level 1 next_level0 4 level_days 7 getting estimates
> 0 (-3) 1 (-3) -1 (-3)
> amdump.20170713000603:planner: FAILED hostanme.domain.name / 20170713000603 0
> "[hmm, no error indicator!]"
> amdump.20170713000603:planner: FAILED hostanme.domain.name /boot
> 20170713000603 0 "[hmm, no error indicator!]"
>
> I am out of ideas about things to do to find a possible reason for the
> failure. Can anyone help me?
If I comment entries in disklist the list of failed machines changes.
To get an idea how big is this installation here is:
amstatus Daily --summary
Using /var/log/amanda/Daily/amdump.1
>From Sun Jul 16 07:45:03 BST 2017
SUMMARY part real estimated
size size
partition : 967
estimated : 166 578g
flush : 240 2711g
failed : 561 0g ( 0.00%)
wait for dumping: 0 0g ( 0.00%)
dumping to tape : 0 0sunit ( 0.00%)
dumping : 0 0g 0g ( 0.00%) ( 0.00%)
dumped : 166 539g 578g ( 93.13%) ( 93.13%)
wait for writing: 166 539g 578g ( 93.13%) ( 93.13%)
wait to flush : 62 269g 269g (100.00%) ( 0.00%)
writing to tape : 0 0g 0g ( 0.00%) ( 0.00%)
failed to tape : 0 0g 0g ( 0.00%) ( 0.00%)
taped : 178 2441g 2441g (100.00%) ( 74.21%)
tape 1 : 178 2441g 2441g (100.00%) Daily-39 (178 chunks)
16 dumpers idle : 0
taper 0 status: Idle
taper qlen: 229
network free kps: 10000000
holding space : 10191g (122.95%)
>
> Kind regards
> Jose M Calhariz
>
>
Now I am looking into the code in planner.c to see if I can insert
debug code and understand why is failing to launch some ssh commands.
Kind regards
Jose M Calhariz
--
--
A verdade e a melhor camuflagem. Ninguem acredita nela.
-- Max Frisch; escritor suico.