On Fri, Jul 14, 2017 at 09:02:38AM +0200, Stefan G. Weichinger wrote:
> Am 2017-07-13 um 17:07 schrieb Jose M Calhariz:
> > 
> > Hi,
> > 
> > I have another installation of amanda.  This one is very big, 120
> > hosts, 750 DLE and using ssh authentication.  Anyone using an
> > instlation of this size?
> > 
> > My problem is that this setup works for some days without problems and
> > other days it can not backup all the servers.
> > 
> > I have investigated the clients.  When there is problems, not one tried
> > to contact the faulty clients, no traffic is being generated from the
> > server to the client.
> > 
> > Now I am trying to make sense from the server logs.  Looking into the
> > planner logs I see messages from the sucessfully servers but not the
> > name of the faulty servers.  Looking into /var/log/amanda/Daily I see
> > messages in the log and amdump of requesting estimates and in the same
> > second saying:
> > 
> > 
> > amdump.20170713000603:planner: time 0.055: setting up estimates for 
> > hostanme.domain.name:/
> > amdump.20170713000603:setup_estimate: hostanme.domain.name:/: command 0, 
> > options: none    last_level 1 next_level0 5 level_days 6    getting 
> > estimates 0 (-3) 1 (-3) 2 (-3)
> > amdump.20170713000603:planner: time 0.055: setting up estimates for 
> > hostanme.domain.name:/boot
> > amdump.20170713000603:setup_estimate: hostanme.domain.name:/boot: command 
> > 0, options: none    last_level 1 next_level0 4 level_days 7    getting 
> > estimates 0 (-3) 1 (-3) -1 (-3)
> > amdump.20170713000603:planner: FAILED hostanme.domain.name / 20170713000603 
> > 0 "[hmm, no error indicator!]"
> > amdump.20170713000603:planner: FAILED hostanme.domain.name /boot 
> > 20170713000603 0 "[hmm, no error indicator!]"
> > 
> > I am out of ideas about things to do to find a possible reason for the
> > failure.  Can anyone help me?
> 
> What happens if you amcheck or amdump only these servers?

The amcheck runs without problems allways.  Being to only the client
or all the machines.


> 
> Does it always happen with the same clients or does that vary?

It vary, the set of affected machines may increase or decrease between
amdump runs.  But all the others tools runs without problems


> 
> Does it only happen with specific clients: this and that OS, version of
> amanda, or so.

Almost all the clients runs Debian 7, 8, and 9.  They use amanda from
Debian, v3.3.1, v3.3.6 and v3.3.9.  The clients are spread by two
sites and the affected clients happen in both sites,


> 
> What is your estimate timeout for that config? maybe share the
> amanda.conf

etimeout -18000 #5 Hours # number of seconds per filesystem for
estimates

I did not notice that the estimate was 5 hours.  The amdump was
progressing after all the other machines returned an estimate.

The setup have evolved very much since when this 5 hours were needed.

I have to request authorization before disclosing logs or config files
to the list.  But I see no problem in disclosing it to the developers
of amanda.


> 
> What version of amanda on the server?
> 
> ... and ... ;-)


The server runs Debian 9 with amanda 3.3.9 from Debian, but the problem
started with Debian 8.  I have done the upgrade to see if it solved or
to report againts the latest versions.


> 
> I assume JLM would suggest to debug that somehow by increasing log
> levels etc.
> 
>

How do I increase the debug levels?  

Kind regards
Jose M Calhariz


-- 
--

É preciso que você saiba se vender bem, sem que isso pareça uma exploração

--Cindy Crawford

Attachment: signature.asc
Description: PGP signature

Reply via email to