On Fri, Jul 14, 2017 at 09:02:38AM +0200, Stefan G. Weichinger wrote: > Am 2017-07-13 um 17:07 schrieb Jose M Calhariz: > > > > Hi, > > > > I have another installation of amanda. This one is very big, 120 > > hosts, 750 DLE and using ssh authentication. Anyone using an > > instlation of this size? > > > > My problem is that this setup works for some days without problems and > > other days it can not backup all the servers. > > > > I have investigated the clients. When there is problems, not one tried > > to contact the faulty clients, no traffic is being generated from the > > server to the client. > > > > Now I am trying to make sense from the server logs. Looking into the > > planner logs I see messages from the sucessfully servers but not the > > name of the faulty servers. Looking into /var/log/amanda/Daily I see > > messages in the log and amdump of requesting estimates and in the same > > second saying: > > > > > > amdump.20170713000603:planner: time 0.055: setting up estimates for > > hostanme.domain.name:/ > > amdump.20170713000603:setup_estimate: hostanme.domain.name:/: command 0, > > options: none last_level 1 next_level0 5 level_days 6 getting > > estimates 0 (-3) 1 (-3) 2 (-3) > > amdump.20170713000603:planner: time 0.055: setting up estimates for > > hostanme.domain.name:/boot > > amdump.20170713000603:setup_estimate: hostanme.domain.name:/boot: command > > 0, options: none last_level 1 next_level0 4 level_days 7 getting > > estimates 0 (-3) 1 (-3) -1 (-3) > > amdump.20170713000603:planner: FAILED hostanme.domain.name / 20170713000603 > > 0 "[hmm, no error indicator!]" > > amdump.20170713000603:planner: FAILED hostanme.domain.name /boot > > 20170713000603 0 "[hmm, no error indicator!]" > > > > I am out of ideas about things to do to find a possible reason for the > > failure. Can anyone help me? > > What happens if you amcheck or amdump only these servers?
The amcheck runs without problems allways. Being to only the client or all the machines. > > Does it always happen with the same clients or does that vary? It vary, the set of affected machines may increase or decrease between amdump runs. But all the others tools runs without problems > > Does it only happen with specific clients: this and that OS, version of > amanda, or so. Almost all the clients runs Debian 7, 8, and 9. They use amanda from Debian, v3.3.1, v3.3.6 and v3.3.9. The clients are spread by two sites and the affected clients happen in both sites, > > What is your estimate timeout for that config? maybe share the > amanda.conf etimeout -18000 #5 Hours # number of seconds per filesystem for estimates I did not notice that the estimate was 5 hours. The amdump was progressing after all the other machines returned an estimate. The setup have evolved very much since when this 5 hours were needed. I have to request authorization before disclosing logs or config files to the list. But I see no problem in disclosing it to the developers of amanda. > > What version of amanda on the server? > > ... and ... ;-) The server runs Debian 9 with amanda 3.3.9 from Debian, but the problem started with Debian 8. I have done the upgrade to see if it solved or to report againts the latest versions. > > I assume JLM would suggest to debug that somehow by increasing log > levels etc. > > How do I increase the debug levels? Kind regards Jose M Calhariz -- -- É preciso que você saiba se vender bem, sem que isso pareça uma exploração --Cindy Crawford
signature.asc
Description: PGP signature
