Hello,

because of importance of these probĺems we have now 3 ways to protect 
the clients from freeze because of loosing connections:

1. TCP-Keepalive tuning (the cleanest way)
/proc/sys/net/ipv4/tcp_keepalive_time = 600
/proc/sys/net/ipv4/tcp_keepalive_intvl = 10
/proc/sys/net/ipv4/tcp_keepalive_probes = 50

2. Using 'nbd-client' with '-persist'-Option (helps sometimes when 1. fails)

3. Using 'cron' script, which checks every minute ...
if (the connection is lost) {
        if (nobody uses that client){
                reboot / shutdown
        }
}
Here you have to remember, that the programs 'reboot/shutdown/poweroff' 
and their libs have to be cached, before the connection breaks

Now it works fine: even if somebody does something stupid like turn off 
a switch or disconnects a cable.

Best regards,

Wojtek



Patrick Rady schrieb:
> I think we are running into an nbd problem much like you described on the 
> LTSP list in November.
> 
> If clients are idle for a period of time, they lose connection to the server.
> 
> How did you tune TCP keepalive to fix this?
> 
> --Patrick
> 
> Patrick Rady
> Administrator, npServ
> NEW (Nonprofit Enterprise at Work)
> office 734-998-0160 ext. 212 / fax 734-998-0163
> 
> [email protected] / http://www.new.org/
> Ann Arbor Office: 1100 N. Main, Suite 100, Ann Arbor, MI 48104-1059
> Detroit Office: Hannan House, 4750 Woodward Ave., Suite 308, Detroit, MI 48201
> ==================================
> Finally! A solution for your nonprofit's tech support headaches. Visit  
> www.new.org/npserv/ to learn more!
> 
> ----- Original Message -----
> From: "Wojtek Polcwiartek" <[email protected]>
> To: [email protected]
> Sent: Wednesday, November 5, 2008 3:16:43 AM GMT -05:00 US/Canada Eastern
> Subject: Re: [Ltsp-discuss] nbd-mounts lost: serious problem
> 
> Hello,
> 
> after 1 month we found the solution to our problem :D
> Problem (short):
> after some time clients lose their NBD-mounts (Log: "Read failed: 
> Connection reset by peer")  It is similar problem to 
> https://bugs.launchpad.net/ubuntu/+source/nbd/+bug/113617
> 
> Solution:
> Tuning of the parameters of the TCP-Keepalive connection (see 
> http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html)
> We suppose our network closes mount-connections. We use mostly 
> enterprise-class network components (Cisco 6500 Series).
> 
> Our LTSP system runs well. We wanted to share our experience.
> 
> Greetings,
> Wojtek
> 
> 
> 
> 
> 
> 
> 
> Wojtek Polcwiartek schrieb:
>> Hello,
>>
>> we think, that the problem is the load-balancer (Cisco ACE). Most of the 
>> traffic on the servers goes through  it. Sniffing showed some strange 
>> RST-Tcp-Packets.
>> We found some way to solve this problem. It is not the most beautiful 
>> solution, but it works for now.
>> We used a script in /etc/rc2.d/ with following lines:
>> nbd-client -d /dev/nbd0
>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist
>>
>> I deconnects the nbd-client and connects it again with "persist" option.
>>
>> Is there any reason, why the option "persist" isn't used by default? For 
>> me the connection seems to be robuster then without it.
>>
>> Is there a clean way to change the parameters of the default nbd-connection?
>>
>>
>> Thanks for help!
>>
>> Wojtek
>>
>>
>>
>>
>> Gideon Romm schrieb:
>>> The only other thing I can think of is your switch.
>>>
>>> Is it a managed switch?  Some switches will not allow a connection to be
>>> active and idle for an extended period of time.
>>>
>>> To test this, connect a single client to the LTSP server via crossover
>>> cable and let it sit for a day, and see if it disconnects, too.  If it
>>> does not, then the problem is the switch, and you should figure out what
>>> setting in the switch needs to be changed, or use a dumber switch.  :)
>>>
>>> -Gideon
>>>
>>>
>>> On Tue, 2008-09-30 at 08:39 +0200, Wojtek Polcwiartek wrote:
>>>> Hello,
>>>>
>>>> yes, we do have this line in /etc/hosts.allow
>>>> We still work on this (wireshark etc.) :/
>>>> Are other tcp-/udp-ports then 69 and 2000 needed?
>>>> Any other ideas?
>>>>
>>>> Greetings,
>>>>
>>>> Wojtek
>>>>
>>>>
>>>>
>>>> Gideon Romm schrieb:
>>>>> Do you have the following line in /etc/hosts.allow:
>>>>>
>>>>> nbdrootd: ALL: keepalive
>>>>>
>>>>> -Gadi
>>>>>
>>>>> On Fri, 2008-09-26 at 12:04 +0200, Wojtek Polcwiartek wrote:
>>>>>> Hello,
>>>>>>
>>>>>> we try to implement LTSP in pc-pool (about 200 thin clients) for 
>>>>>> students at Tech.Univ. of Berlin (we are students too). The work is 
>>>>>> almost done. We are now in the test phase. Here we got an error, witch 
>>>>>> can stop our project :/ We use lt...@hardy.
>>>>>> Our problem: The connection between nbd-client and ndb-server breaks.
>>>>>>
>>>>>> The message at the clients says (After switching to another terminal):
>>>>>> "nbd0: Attempted to send on closed socket"
>>>>>>
>>>>>> The logs at the server:
>>>>>> - Connection
>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbdrootd[11882]: connect from 
>>>>>> 130.149.10.132 (130.149.10.132)
>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: connect from 
>>>>>> 130.149.10.132, assigned file is /opt/ltsp/images/i386.img
>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: Size of exported 
>>>>>> file/device is 228229120
>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbdrootd[11903]: connect from 
>>>>>> 130.149.10.131 (130.149.10.131)
>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: connect from 
>>>>>> 130.149.10.131, assigned file is /opt/ltsp/images/i386.img
>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: Size of exported 
>>>>>> file/device is 228229120
>>>>>>
>>>>>> - Connection lost
>>>>>> Sep 24 17:56:08 lts02 nbd_server[11883]: Read failed: Connection reset 
>>>>>> by peer
>>>>>> Sep 24 17:56:08 lts02 nbd_server[11904]: Read failed: Connection reset 
>>>>>> by peer
>>>>>>
>>>>>>
>>>>>> Do You have any idea why could it happen?
>>>>>>
>>>>>> What tcp-ports are needed for well-working LTSP? We opened 69 (tftp) and 
>>>>>> 2000 (nbd-server). Our network infrastructure works good: we couldn't 
>>>>>> notice high-traffic time periods.
>>>>>>
>>>>>> Our H/W-Configuration:
>>>>>> 2xServers (4x3GHz, 4GB Ram), H/W load balancer
>>>>>> about 200x HP t5725, t5735 and t5525
>>>>>>
>>>>>>
>>>>>> I already wrote an email about this error, but now I deliver some 
>>>>>> details.
>>>>>>
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>>
>>
> 
> 


-- 
Wojtek Polcwiartek

------
tubIT
TU-Berlin
Web   : www.tubit.tu-berlin.de
Email : [email protected]
Tel   : +49.30.314.28000

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_____________________________________________________________________
Ltsp-discuss mailing list.   To un-subscribe, or change prefs, goto:
      https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help,   try #ltsp channel on irc.freenode.net

Reply via email to