----- Original Message -----
From: "John R. Jackson" <[EMAIL PROTECTED]>
To: "Ryan Williams" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, April 30, 2001 7:20 PM
Subject: Re: Strange Amanda Error: sendbackup: index tee cannot write
[Broken pipe]


> >Below are the two pertinant parts (I think) of an error that is occuring
on
> >an everyday basis on this server.  ...
>
> What version of Amanda?  Same version on client and server?
2.4.2 on both client and server

>
> >... There are 11 other servers that are being backed up
> >just fine but this one just does not want to work.  ...
>
> So what's different about it?  (just kidding :-).
>
> >... Of course it was upgraded because we could not
> >find a network card that worked decently ...
>
> Are we talking 100 Mbit Ethernet?  Any chance you have a duplex problem
> between the client and switch?  Getting that wrong can cause truly
> amazingly bad performance.
>

It is a 100meg card set to autoselect between all of the 100 and 10 base T.
The network is a 10baseT. I am not shure if it is full or half duplex. It
could be a duplex problem. I believe that our backup network can only
support 1/2 duplex.

    rl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        inet 10.0.0.203 netmask 0xffffff00 broadcast 10.0.0.255
        inet6 fe80::250:baff:fe88:c760%rl0 prefixlen 64 scopeid 0x2
        ether 00:50:ba:88:c7:60
        media: autoselect (none) status: active
        supported media: autoselect 100baseTX <full-duplex> 100baseTX
10baseT/UTP <full-duplex> 10baseT/UTP                         100baseTX
<hw-loopback>


What do you make of that. I dont see anywhere that it says what it is
actually set at. Mabey I should try and manually set the mode on the card.

> >  web45.inte ad0s1f lev 0 FAILED [data timeout]
> >  web45.inte ad0s1a lev 0 FAILED [could not connect to web45.internal]
> >  web45.inte ad0s1e lev 0 FAILED [could not connect to web45.internal]
>
> The first one says Amanda waited 30 minutes between getting started or
> getting a block of data and then gave up.  I'm guessing the other two
> failures were because amandad was still running on the client and so
> the new connections were not allowed.
>
> If you're running 2.4.2 or beyond, you could increase the dtimeout
> value in amanda.conf, but half an hour to wait on data is a long time.
> I suspect it means something else is wrong.
>
1/2 an hour should be plenty of time I would think.

> It would also be useful to go to the client and look at sendbackup*debug
> in /tmp/amanda, in particular the start and stop time (first and last
> lines).
>

/usr/local/libexec/sendbackup: got input request: DUMP ad0s1e 0
1970:1:1:0:0:0 OPTIONS |;bsd-auth;srvcomp-fast;index;
  parsed request as: program `DUMP' disk `ad0s1e' lev 0 since 1970:1:1:0:0:0
opt `|;bsd-auth;srvcomp-fast;index;'
  waiting for connect on 2622, then 2623, then 2624
/usr/local/libexec/sendbackup: timeout on mesg port 2623
/usr/local/libexec/sendbackup: timeout on index port 2624
sendbackup: pid 79500 finish time Tue May  1 01:47:00 2001


> The "index tee cannot write [Broken pipe]" stuff is just a symptom of
> the server giving up on the client.  The server shut down the connections
> and the client was still trying to write, so it got a "broken pipe" error.
> The real problem is why the data stream quit moving.
>
> Some other possibilities:
>
>   * An **extremely** busy disk that tar has a hard time getting access
>     to.
>
Judging on a ps auxw, It does not appear that the server is doing an awefull
lot on the disk. It should not be doing much as it is mostly just a mail
server for a small number of users.
>   * A busy client and you have compression turned on, so it just cannot
>     get enough CPU.
>
It has a big enough cpu to try and do so I would believe. It is a PII 350
and like I said it does not do much.
>   * A very large file system that compresses extremely well so tar and
>     gzip are crunching along but not generating any (enough) output.
>
Small filesystem. The whole server is only about 3-4 gigs used.

>   * A broken disk that is very slow to respond (e.g. lots of retries).
>
I dont see any evidence of this on the server.


>   * Some kind of networking problem, software or hardware.  You might
>     try some big ftp put's from the client to /dev/null on the server
>     and see what happens.
>
Most likely. I was at first under the impression from the output that it was
a software problem. I am going to look further into the hardware aspects of
it now.

> >Ryan Williams
>
> John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
>


Thanks

Reply via email to