I have a client with an amandad that has been running since Sep 23...

backup     97592  0.0  0.1 26780  7016  ??  Ss   23Sep10  40:30.43 amandad

Most of the backups on that client still work fine.  But two DLEs
fail nightly.  On the server, you get:

1286525084.841860: chunker: getcmd: START 20101007210002
1286525084.841877: chunker: getcmd: PORT-WRITE 00-00195 
/holding/20101007210002/someclient._somedle.0 someclient ffffffff9ffeffffffff7f 
/somedle 0 1970:1:1:0:0:0 51
2000 APPLICATION 111136 
|;auth=BSD;compress-fast;index;exclude-file=.no-amanda-backup;exclude-file=.nobak;exclude-file=.noback;exclude-file=.nodump;
1286525084.842069: chunker: stream_server opening socket with family 2 
(requested family was 2)
1286525084.842086: chunker: try_socksize: receive buffer size is 65536
1286525084.844115: chunker: bind_portrange2: Try  port 11017: Available - 
Success
1286525084.844135: chunker: stream_server: waiting for connection: 0.0.0.0.11017
1286525084.844142: chunker: putresult: 23 PORT
1286525084.847225: chunker: stream_accept: connection from 127.0.0.1.11002
1286525084.847233: chunker: try_socksize: receive buffer size is 65536
1286525264.872340: chunker: putresult: 10 FAILED
1286525264.872462: chunker: pid 18935 finish time Fri Oct  8 02:07:44 2010


The amandad log on the client shows nothing at the 1286525084
timestamp (yes, the hosts in questions have good time sync).

It does show sendbackup entries after the 3 minute timeout on the
server above (1286525264 timestamp).  So the client amandad seems to
just be slow in responding.


It's not clear why this long running amandad is slow in responding for
a couple DLEs, but it's definitely abnormal to have such a long
running amandad to begin with.


lsof shows lots of open file descriptors like so:

amandad 97592 backup  609u  PIPE 0xffffff01dccfa000    16384        
->0xffffff01dccfa158

There may be a descriptor leak bug, but that's sort of unimportant since
_usually_ amandad runs only briefly.

The real question is: why is amandad not exiting?

Has anyone seen this before?

I plan to kill amandad on the client, but I'll leave it running for a
bit longer in case there might be something that can be learned.
Unfortunately, this is amanda-2.6.1p1 on the client, so interest in
learning about this anomaly in that code is likely low.

Reply via email to