I have been testing the new amanda 3.5 and 3.5.1 on Debian, on my
servers, before uploading the package into to Debian sid.
The 3.5 successfully solved the problem with big installations of
amanda here the fault of one or 2 clients cascade to more failures
during backups or "amcheck -c".
I have been testing the 3.5.1 in my little server with success and
upload it to Debian. But now that I have installed it on my big
server, 130 clients and 700 DLE, I am seeing it again the cascade
problem of multiple failures because of one or two failures. I am
seeking help to pinpoint the problem. It can be a regression on the
code or must probably a problem on my compiled packages for stretch.
Main question: What logs I should look or what patch I can apply to
try to identified the root cause of the problem?
Currently I have 3 hosts that are unreachable. This cause a non
predictable failure. Meaning that sometimes everything works as
expected, but most of the times more clients fail with:
selfcheck request failed: error sending REQ: write error to: Broken
Looking into one of the clients that fail, I do not see an obvious
error message in the amanda logs.
I am willing to continue looking into the logs or running modified
code or even run a debugger. For privacy reasons I will post here a
resume of the findings or send by private email the full logs and
The server runs Debian 9 (stretch) and use amanda software 3.5.1 from
Debian sid recompiled for stretch. Most of the clients are Debian 9
and runs amanda client from Debian 3.3.9-5 or 3.5 and 3.5.1
(backported from Debian sid).
The patch that I have applied for Debian should not have changed this
behaviour and can be checked in:
Jose M Calhariz
Goze a vida, afinal você nasceu de uma gozada!