Hello,
because of importance of these probĺems we have now 3 ways to protect
the clients from freeze because of loosing connections:
1. TCP-Keepalive tuning (the cleanest way)
/proc/sys/net/ipv4/tcp_keepalive_time = 600
/proc/sys/net/ipv4/tcp_keepalive_intvl = 10
/proc/sys/net/ipv4/tcp_keepalive_probes = 50
2. Using 'nbd-client' with '-persist'-Option (helps sometimes when 1. fails)
3. Using 'cron' script, which checks every minute ...
if (the connection is lost) {
if (nobody uses that client){
reboot / shutdown
}
}
Here you have to remember, that the programs 'reboot/shutdown/poweroff'
and their libs have to be cached, before the connection breaks
Now it works fine: even if somebody does something stupid like turn off
a switch or disconnects a cable.
Best regards,
Wojtek
Patrick Rady schrieb:
> I think we are running into an nbd problem much like you described on the
> LTSP list in November.
>
> If clients are idle for a period of time, they lose connection to the server.
>
> How did you tune TCP keepalive to fix this?
>
> --Patrick
>
> Patrick Rady
> Administrator, npServ
> NEW (Nonprofit Enterprise at Work)
> office 734-998-0160 ext. 212 / fax 734-998-0163
>
> [email protected] / http://www.new.org/
> Ann Arbor Office: 1100 N. Main, Suite 100, Ann Arbor, MI 48104-1059
> Detroit Office: Hannan House, 4750 Woodward Ave., Suite 308, Detroit, MI 48201
> ==================================
> Finally! A solution for your nonprofit's tech support headaches. Visit
> www.new.org/npserv/ to learn more!
>
> ----- Original Message -----
> From: "Wojtek Polcwiartek" <[email protected]>
> To: [email protected]
> Sent: Wednesday, November 5, 2008 3:16:43 AM GMT -05:00 US/Canada Eastern
> Subject: Re: [Ltsp-discuss] nbd-mounts lost: serious problem
>
> Hello,
>
> after 1 month we found the solution to our problem :D
> Problem (short):
> after some time clients lose their NBD-mounts (Log: "Read failed:
> Connection reset by peer") It is similar problem to
> https://bugs.launchpad.net/ubuntu/+source/nbd/+bug/113617
>
> Solution:
> Tuning of the parameters of the TCP-Keepalive connection (see
> http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html)
> We suppose our network closes mount-connections. We use mostly
> enterprise-class network components (Cisco 6500 Series).
>
> Our LTSP system runs well. We wanted to share our experience.
>
> Greetings,
> Wojtek
>
>
>
>
>
>
>
> Wojtek Polcwiartek schrieb:
>> Hello,
>>
>> we think, that the problem is the load-balancer (Cisco ACE). Most of the
>> traffic on the servers goes through it. Sniffing showed some strange
>> RST-Tcp-Packets.
>> We found some way to solve this problem. It is not the most beautiful
>> solution, but it works for now.
>> We used a script in /etc/rc2.d/ with following lines:
>> nbd-client -d /dev/nbd0
>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist
>>
>> I deconnects the nbd-client and connects it again with "persist" option.
>>
>> Is there any reason, why the option "persist" isn't used by default? For
>> me the connection seems to be robuster then without it.
>>
>> Is there a clean way to change the parameters of the default nbd-connection?
>>
>>
>> Thanks for help!
>>
>> Wojtek
>>
>>
>>
>>
>> Gideon Romm schrieb:
>>> The only other thing I can think of is your switch.
>>>
>>> Is it a managed switch? Some switches will not allow a connection to be
>>> active and idle for an extended period of time.
>>>
>>> To test this, connect a single client to the LTSP server via crossover
>>> cable and let it sit for a day, and see if it disconnects, too. If it
>>> does not, then the problem is the switch, and you should figure out what
>>> setting in the switch needs to be changed, or use a dumber switch. :)
>>>
>>> -Gideon
>>>
>>>
>>> On Tue, 2008-09-30 at 08:39 +0200, Wojtek Polcwiartek wrote:
>>>> Hello,
>>>>
>>>> yes, we do have this line in /etc/hosts.allow
>>>> We still work on this (wireshark etc.) :/
>>>> Are other tcp-/udp-ports then 69 and 2000 needed?
>>>> Any other ideas?
>>>>
>>>> Greetings,
>>>>
>>>> Wojtek
>>>>
>>>>
>>>>
>>>> Gideon Romm schrieb:
>>>>> Do you have the following line in /etc/hosts.allow:
>>>>>
>>>>> nbdrootd: ALL: keepalive
>>>>>
>>>>> -Gadi
>>>>>
>>>>> On Fri, 2008-09-26 at 12:04 +0200, Wojtek Polcwiartek wrote:
>>>>>> Hello,
>>>>>>
>>>>>> we try to implement LTSP in pc-pool (about 200 thin clients) for
>>>>>> students at Tech.Univ. of Berlin (we are students too). The work is
>>>>>> almost done. We are now in the test phase. Here we got an error, witch
>>>>>> can stop our project :/ We use lt...@hardy.
>>>>>> Our problem: The connection between nbd-client and ndb-server breaks.
>>>>>>
>>>>>> The message at the clients says (After switching to another terminal):
>>>>>> "nbd0: Attempted to send on closed socket"
>>>>>>
>>>>>> The logs at the server:
>>>>>> - Connection
>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbdrootd[11882]: connect from
>>>>>> 130.149.10.132 (130.149.10.132)
>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: connect from
>>>>>> 130.149.10.132, assigned file is /opt/ltsp/images/i386.img
>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: Size of exported
>>>>>> file/device is 228229120
>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbdrootd[11903]: connect from
>>>>>> 130.149.10.131 (130.149.10.131)
>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: connect from
>>>>>> 130.149.10.131, assigned file is /opt/ltsp/images/i386.img
>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: Size of exported
>>>>>> file/device is 228229120
>>>>>>
>>>>>> - Connection lost
>>>>>> Sep 24 17:56:08 lts02 nbd_server[11883]: Read failed: Connection reset
>>>>>> by peer
>>>>>> Sep 24 17:56:08 lts02 nbd_server[11904]: Read failed: Connection reset
>>>>>> by peer
>>>>>>
>>>>>>
>>>>>> Do You have any idea why could it happen?
>>>>>>
>>>>>> What tcp-ports are needed for well-working LTSP? We opened 69 (tftp) and
>>>>>> 2000 (nbd-server). Our network infrastructure works good: we couldn't
>>>>>> notice high-traffic time periods.
>>>>>>
>>>>>> Our H/W-Configuration:
>>>>>> 2xServers (4x3GHz, 4GB Ram), H/W load balancer
>>>>>> about 200x HP t5725, t5735 and t5525
>>>>>>
>>>>>>
>>>>>> I already wrote an email about this error, but now I deliver some
>>>>>> details.
>>>>>>
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>>
>>
>
>
--
Wojtek Polcwiartek
------
tubIT
TU-Berlin
Web : www.tubit.tu-berlin.de
Email : [email protected]
Tel : +49.30.314.28000
------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_____________________________________________________________________
Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto:
https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help, try #ltsp channel on irc.freenode.net