On 2007-08-02 15:56, Mike Gallant wrote:
I have an amanda (2.5.1p3) server (solaris10) with seven amanda clients (2.4.4) and a new amanda (2.5.2) client on new hardware. I have been getting a "index tee cannot write [Broken pipe]" error message. I have looked at all the various aspects I can think of but still no joy. Could someone point me in the right direction for resolving this.
The error "index tee cannot write" is the collateral damage of some earlier problem. The index tcp-connection got closed, probably because of some other error. See also: http://wiki.zmanda.com/index.php/Mesg_read:_Connection_reset_by_peer which has a similar effect on the client. Don't focus on this "index tee" but look earlier. I'm interested in the error messages from the server-side as well.
I have included the sendbackup.debug (client) and a good part of the amandad (client) missing the repetitive entries.Thanks, Mike sendbackup: debug 1 pid 14529 ruid 105 euid 105: start at Thu Aug 2 01:55:27 2007 sendbackup: version 2.5.2 [...] sendbackup: time 3.199: 91: normal(|): DUMP: Estimated 5852872 blocks (2857.85MB) on 0.04 tapes.
So you're trying to dump less than 3 Gbyte.
sendbackup: time 3.245: 91: normal(|): DUMP: Dumping (Pass III) [directories] sendbackup: time 16237.077: index tee cannot write [Broken pipe]
Here the index tcp-connection got broken, proabably because the server gave up on this host. The real problem happened much earlier. (more than 4 hours after startup -- a really patient server :-) )
sendbackup: time 16237.077: pid 14531 finish time Thu Aug 2 06:26:04 2007
vReading conf file "/etc/opt/slis/amanda/amanda-client.conf".
[...]
amandad: time 0.005: security_getdriver(name=BSD) returns ff2dd95c amandad: version 2.5.2 amandad: time 0.005: build: VERSION="Amanda-2.5.2" amandad: time 0.005: BUILT_DATE="Thu May 24 22:05:16 EDT 2007"
Did it work correct before? Of do you struggle with the problem until now?
[...] amandad: time 0.024: accept recv REQ pkt: <<<<< SERVICE sendbackup DUMP /zones 0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;index; >>>>> amandad: time 0.024: creating new service: sendbackup DUMP /zones 0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;index; amandad: time 0.028: sending ACK pkt: <<<<< >>>>> amandad: time 0.028: dgram_send_addr(addr=30db0, dgram=ff2dddac) amandad: time 0.028: (sockaddr_in *)30db0 = { 2, 731, 192.168.36.38 } amandad: time 0.028: dgram_send_addr: ff2dddac->socket = 0amandad: time 0.072: security_streaminit(stream=418f8, driver=ff2dd95c (BSD))amandad: time 0.072: try_socksize: send buffer size is 65536 amandad: time 0.072: try_socksize: receive buffer size is 65536 amandad: time 0.072: stream_server: Could not bind to any port: Invalid argument
Here is the problem, I believe, although I do not know what is wrong. Some syscall is giving "EINVAL" on one of the arguments. Do you have some weird network setup? e.g. do you have a lower limit then 64k for UDP-packets etc.
amandad: time 0.072: stream_server: Retrying entire range after 10 second delay.
... and so on ...
amandad: time 1800.842: security_seterror(handle=30d90, driver=ff2dd95c (BSD) error=can't create server stream: Invalid argument) amandad: time 1800.842: couldn't open stream to server: can't create server stream: Invalid argument amandad: time 1800.842: security_streaminit(stream=418f8, driver=ff2dd95c (BSD))
The upper layers repeating what the lower layers were saying...
amandad: time 1800.842: try_socksize: send buffer size is 65536 amandad: time 1800.842: try_socksize: receive buffer size is 65536 amandad: time 1800.842: stream_server: Could not bind to any port: Invalid argument amandad: time 1800.842: stream_server: Retrying entire range after 10 second delay.
and starting again (not sure why??)
amandad: time 5402.381: stream_server: bind(in6addr_any) failed: Invalid argument
in6addr_any?? suddenly using IPV6?? I believe 2.5.2p1 fixed some IPV6 bugs. (It has at least less bugs, so better try that one. I'm using it without problems.)
amandad: time 16247.142: sending REP pkt: <<<<< CONNECT DATA -1 MESG -1 INDEX -1 OPTIONS features=ffffffff9ffeffffffff00; >>>>>
More trouble, proabably because of the previous ones. Normally amandad should reply with the tcp-ports it has opened for the three streams, and "-1" is of course not valid.
amandad: time 16247.142: dgram_send_addr(addr=49e40, dgram=ff2dddac)
amandad: time 16247.142: (sockaddr_in *)49e40 = { 2, 731, 192.168.36.38 }
amandad: time 16247.142: dgram_send_addr: ff2dddac->socket = 0
amandad: time 16257.150: timeout
amandad: time 16257.150: timeout waiting for ACK for our REP
But after 16000 seconds the server isn't expecting anything from this host probably. -- Paul Bijnens, xplanation Technology Services Tel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *********************************************************************** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***********************************************************************
