I have been looking at the enormous amount of debug files, and what I can see is that always there are "fail" lines in the dumper.*.debug files of the type: 1207905536.305211: dumper: connect_portrange: connect to 127.0.0.2.10080 failed: Cannot assign requested address "
However, that seems to be normal, because even in normal cases, these lines are present. In the case there are these "cannot read header" problems, the chunker.*.debug file report an error. For example, today the same DLE /home/charles/pictures failed in the chuncker: " 1207647127.918807: chunker: pid 8024 ruid 1002 euid 1002: start at Tue Apr 8 11:32:07 2008 1207647127.919828: chunker: pid 8024 ruid 1002 euid 1002: rename at Tue Apr 8 11:32:07 2008 1207647127.919889: chunker: getcmd: START 20080408110501 1207647127.919912: chunker: getcmd: PORT-WRITE 03-00013 /work-1/amanda/dumps/20080408110501/fiume.localnet._home_charles_pictures.1 fiume.localnet ffff ffff9ffeffffffff00 /home/charles/pictures 1 2008:4:4:9:7:56 1048576 GNUTAR 928 |;auth=bsdtcp;index; 1207647127.920075: chunker: try_socksize: receive buffer size is 65536 1207647127.924152: chunker: bind_portrange2: Try port 11028: Available - Success 1207647127.924186: chunker: stream_server: waiting for connection: 0.0.0.0.11028 1207647127.924193: chunker: putresult: 23 PORT 1207647127.928237: chunker: stream_accept: connection from 127.0.0.1.11002 1207647127.928247: chunker: try_socksize: receive buffer size is 65536 1207647137.985112: chunker: putresult: 10 FAILED 1207647137.985202: chunker: pid 8024 finish time Tue Apr 8 11:32:17 2008 " I think the reason for the "cannot read header" message is because the file left in the dumps disk is zero length: " ll /work-1/amanda/dumps/200804*/* -rw------- 1 amandabackup disk 0 Apr 6 11:22 /work-1/amanda/dumps/20080406110501/fiume.localnet._home_charles_pictures.1 -rw------- 1 amandabackup disk 0 Apr 8 11:32 /work-1/amanda/dumps/20080408110501/fiume.localnet._home_charles_pictures.1 -rw------- 1 amandabackup disk 0 Apr 11 11:18 /work-1/amanda/dumps/20080411110501/fiume.localnet._home_charles_pictures.1 " If there are problems, it is always this /home/charles/pictures DLE which fails and sometimes also another, but often the other recovers, as today: " fiume.localnet /usr lev 1 FAILED [cannot read header: got 0 bytes instead of 32768]fiume.localnet /usr lev 1 was successfully retried " Thanks for looking. Charles On Thu, 10 Apr 2008 10:18:10 -0400 "Dustin J. Mitchell" <[EMAIL PROTECTED]> wrote: > On Tue, Apr 8, 2008 at 10:38 AM, Charles Stroom > <[EMAIL PROTECTED]> wrote: > > FAILURE DUMP SUMMARY: > > fiume.localnet /home/charles/pictures lev 1 FAILED [cannot read > > header: got 0 bytes instead of 32768] > > fiume.localnet /home/charles/pictures lev 1 FAILED [too many > > dumper retry: "[request failed: Resource temporarily unavailable]"] > > fiume.localnet /home/charles/pictures lev 1 FAILED [cannot read > > header: got 0 bytes instead of 32768] > > My apologies -- now that the 2.6.0 release has been out, I've all been > excitedly hacking on new work for 2.6.1. A side-effect is that I've > just been "skimming" amanda-users, and I missed this email. > > There could be a number of problems underlying this, unfortunately. > Basically, the header didn't arrive on the server, but there's no > indication of what caused that. The next step is to look around in > the debug logs on both the server and the client for anything unusual > about those runs (unusual = different from the runs that succeeded). > On the server, have a look at dumper.*. On the client, have a look at > amandad.* and sendbackup*. > > "Resource temporarily unavailable" corresponds to EAGAIN, and is > usually seen with non-blocking IO, which Amanda doesn't use, so I'm > not sure what to make of that information. The dumper debug log may > clarify that. > > Dustin > -- Charles Stroom email: charles at no-spam.stremen.xs4all.nl (remove the "no-spam.")
