Paul Bijnens wrote at 12:33 +0200 on May 27, 2008: > On 2008-05-25 18:55, jehan procaccia wrote: > > hello, > > > > some clients with "big" partitions (>100Gbytes) freezes my amdump, I > > usually get dumps errors which cannot end properly. > > I have 2 questions, > > 1) how can I resolve that "client" error, timeout or whatever ? > > This look suspiciously like the problem (and solution) described here: > > http://wiki.zmanda.com/index.php/Mesg_read:_Connection_reset_by_peer
Note that the explanation for tcp_keepalive_time (spelled 'net.inet.tcp.keepidle' on FreeBSD - see http://www.freebsd.org/cgi/man.cgi?query=tcp&apropos=0&sektion=0&manpath=FreeBSD+7.0-RELEASE&format=html) is not quite accurate. It is the amount of time before the tcp stack decides a connection is idle which then sends N keepalives (N is configurable). The keepalives are then sent at some interval (/proc/sys/net/ipv4/tcp_keepalive_intvl seconds on linux, net.inet.tcp.keepintvl ms on FreeBSD). The explanation on the wiki page seems to imply that the setting is the interval between keepalives. That's not quite a correct interpretation of those particular settings (at least not the linux one - not sure about the Solaris setting). For the problem this wiki page is addressing, you probably want to lower the idle timeout, and possibly modify the subsequent sending interval and possibly the count. This is what the suggested command for linux actually does - it's just the explanatory text that is unclear. Of course, these settings will affect all tcp connections on the machine by default. It is not really intended for determination of application level socket status, but rather status of the host (or comms link). If you are hitting these keepalive timeouts, you may really want to figure out why (overly aggressive firewall for instance). It might be worthwhile for amanda to grow its own keepalive mechanism (similar to ssh) to address this issue so one doesn't have to change settings for all tcp connections.