I have a client with an amandad that has been running since Sep 23... backup 97592 0.0 0.1 26780 7016 ?? Ss 23Sep10 40:30.43 amandad
Most of the backups on that client still work fine. But two DLEs fail nightly. On the server, you get: 1286525084.841860: chunker: getcmd: START 20101007210002 1286525084.841877: chunker: getcmd: PORT-WRITE 00-00195 /holding/20101007210002/someclient._somedle.0 someclient ffffffff9ffeffffffff7f /somedle 0 1970:1:1:0:0:0 51 2000 APPLICATION 111136 |;auth=BSD;compress-fast;index;exclude-file=.no-amanda-backup;exclude-file=.nobak;exclude-file=.noback;exclude-file=.nodump; 1286525084.842069: chunker: stream_server opening socket with family 2 (requested family was 2) 1286525084.842086: chunker: try_socksize: receive buffer size is 65536 1286525084.844115: chunker: bind_portrange2: Try port 11017: Available - Success 1286525084.844135: chunker: stream_server: waiting for connection: 0.0.0.0.11017 1286525084.844142: chunker: putresult: 23 PORT 1286525084.847225: chunker: stream_accept: connection from 127.0.0.1.11002 1286525084.847233: chunker: try_socksize: receive buffer size is 65536 1286525264.872340: chunker: putresult: 10 FAILED 1286525264.872462: chunker: pid 18935 finish time Fri Oct 8 02:07:44 2010 The amandad log on the client shows nothing at the 1286525084 timestamp (yes, the hosts in questions have good time sync). It does show sendbackup entries after the 3 minute timeout on the server above (1286525264 timestamp). So the client amandad seems to just be slow in responding. It's not clear why this long running amandad is slow in responding for a couple DLEs, but it's definitely abnormal to have such a long running amandad to begin with. lsof shows lots of open file descriptors like so: amandad 97592 backup 609u PIPE 0xffffff01dccfa000 16384 ->0xffffff01dccfa158 There may be a descriptor leak bug, but that's sort of unimportant since _usually_ amandad runs only briefly. The real question is: why is amandad not exiting? Has anyone seen this before? I plan to kill amandad on the client, but I'll leave it running for a bit longer in case there might be something that can be learned. Unfortunately, this is amanda-2.6.1p1 on the client, so interest in learning about this anomaly in that code is likely low.
