About a month ago, our backups went from taking ~8 hours to taking 16-24 hours or more. During this time, two specific shares on one of our servers, alva, started giving data timeout errors (e.g., alva /home lev 4 FAILED [data timeout]). Other shares still backed up fine, even if they were the same size or larger and were located on the same RAID array on the same server. Furthermore, Amanda would often report the failures twice: alva /home lev 0 FAILED [data timeout] alva /home lev 0 FAILED [data timeout] If I checked the server's ps list while Amanda was running, I could see two different attempts to back up a drive running at the same time.
The problem appeared to start when I upgraded the OS (and hence Amanda) on another Amanda client, but I don't know how upgrading one client would affect the speed of backing up another client (unless I've hit a bug in Amanda). Any suggestions on how to fix this problem? The symptoms seem odd enough that I'm not even sure where to begin. We're running Amanda 2.5.0p2-4, as provided with CentOS 5. I know that newer versions of Amanda are available, but I'd prefer to stick with what came with the OS unless there's compelling reason to upgrade. Thank you. Josh Kelley
