Hello,

You could find the answer with further reading :P (in some older Email)
This part of the solution works only 'sometimes'.
IMO, the most robust way is to use this 3-solutions together.


 >>> We found some way to solve this problem. It is not the most beautiful
 >>> solution, but it works for now.
 >>> We used a script in /etc/rc2.d/ with following lines:
 >>> nbd-client -d /dev/nbd0
 >>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist
 >>>
 >>> I deconnects the nbd-client and connects it again with "persist" 
option.
 >>>

Best regards,

Wojtek


Patrick Rady schrieb:
> Thank you very much for your response!  It is very helpful...
> 
> With regard to 2. using the -persist option....  where do you do this?
> 
> Regards,
> 
> --Patrick
> 
> 
> ----- Original Message -----
> From: "Wojtek Polcwiartek" <[email protected]>
> To: [email protected]
> Cc: "Patrick Rady" <[email protected]>
> Sent: Friday, January 23, 2009 10:44:29 AM GMT -05:00 US/Canada Eastern
> Subject: Re: [Ltsp-discuss] nbd-mounts lost: serious problem
> 
> Hello,
> 
> because of importance of these probĺems we have now 3 ways to protect 
> the clients from freeze because of loosing connections:
> 
> 1. TCP-Keepalive tuning (the cleanest way)
> /proc/sys/net/ipv4/tcp_keepalive_time = 600
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 10
> /proc/sys/net/ipv4/tcp_keepalive_probes = 50
> 
> 2. Using 'nbd-client' with '-persist'-Option (helps sometimes when 1. fails)
> 
> 3. Using 'cron' script, which checks every minute ...
> if (the connection is lost) {
>       if (nobody uses that client){
>               reboot / shutdown
>       }
> }
> Here you have to remember, that the programs 'reboot/shutdown/poweroff' 
> and their libs have to be cached, before the connection breaks
> 
> Now it works fine: even if somebody does something stupid like turn off 
> a switch or disconnects a cable.
> 
> Best regards,
> 
> Wojtek
> 
> 
> 
> Patrick Rady schrieb:
>> I think we are running into an nbd problem much like you described on the 
>> LTSP list in November.
>>
>> If clients are idle for a period of time, they lose connection to the server.
>>
>> How did you tune TCP keepalive to fix this?
>>
>> --Patrick
>>
>> Patrick Rady
>> Administrator, npServ
>> NEW (Nonprofit Enterprise at Work)
>> office 734-998-0160 ext. 212 / fax 734-998-0163
>>
>> [email protected] / http://www.new.org/>>> We found some way to solve this 
>> problem. It is not the most beautiful 
>>> solution, but it works for now.
>>> We used a script in /etc/rc2.d/ with following lines:
>>> nbd-client -d /dev/nbd0
>>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist
>>>
>>> I deconnects the nbd-client and connects it again with "persist" option.
>>>
>> Ann Arbor Office: 1100 N. Main, Suite 100, Ann Arbor, MI 48104-1059
>> Detroit Office: Hannan House, 4750 Woodward Ave., Suite 308, Detroit, MI 
>> 48201
>> ==================================
>> Finally! A solution for your nonprofit's tech support headaches. Visit  
>> www.new.org/npserv/ to learn more!
>>
>> ----- Original Message -----
>> From: "Wojtek Polcwiartek" <[email protected]>
>> To: [email protected]
>> Sent: Wednesday, November 5, 2008 3:16:43 AM GMT -05:00 US/Canada Eastern
>> Subject: Re: [Ltsp-discuss] nbd-mounts lost: serious problem
>>
>> Hello,
>>
>> after 1 month we found the solution to our problem :D
>> Problem (short):
>> after some time clients lose their NBD-mounts (Log: "Read failed: 
>> Connection reset by peer")  It is similar problem to 
>> https://bugs.launchpad.net/ubuntu/+source/nbd/+bug/113617
>>
>> Solution:
>> Tuning of the parameters of the TCP-Keepalive connection (see 
>> http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html)
>> We suppose our network closes mount-connections. We use mostly 
>> enterprise-class network components (Cisco 6500 Series).
>>
>> Our LTSP system runs well. We wanted to share our experience.
>>
>> Greetings,
>> Wojtek
>>
>>
>>
>>
>>
>>
>>
>> Wojtek Polcwiartek schrieb:
>>> Hello,
>>>
>>> we think, that the problem is the load-balancer (Cisco ACE). Most of the 
>>> traffic on the servers goes through  it. Sniffing showed some strange 
>>> RST-Tcp-Packets.
>>> We found some way to solve this problem. It is not the most beautiful 
>>> solution, but it works for now.
>>> We used a script in /etc/rc2.d/ with following lines:
>>> nbd-client -d /dev/nbd0
>>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist
>>>
>>> I deconnects the nbd-client and connects it again with "persist" option.
>>>
>>> Is there any reason, why the option "persist" isn't used by default? For 
>>> me the connection seems to be robuster then without it.
>>>
>>> Is there a clean way to change the parameters of the default nbd-connection?
>>>
>>>
>>> Thanks for help!
>>>
>>> Wojtek
>>>
>>>
>>>
>>>
>>> Gideon Romm schrieb:
>>>> The only other thing I can think of is your switch.
>>>>
>>>> Is it a managed switch?  Some switches will not allow a connection to be
>>>> active and idle for an extended period of time.
>>>>
>>>> To test this, connect a single client to the LTSP server via crossover
>>>> cable and let it sit for a day, and see if it disconnects, too.  If it
>>>> does not, then the problem is the switch, and you should figure out what
>>>> setting in the switch [email protected] to be 
>>>> changed, or use a dumber switch.  :)
>>>>
>>>> -Gideon
>>>>
>>>>
>>>> On Tue, 2008-09-30 at 08:39 +0200, Wojtek Polcwiartek wrote:
>>>>> Hello,
>>>>>
>>>>> yes, we do have this line in /etc/hosts.allow
>>>>> We still work on this (wireshark etc.) :/
>>>>> Are other tcp-/udp-ports then 69 and 2000 needed?
>>>>> Any other ideas?
>>>>>
>>>>> Greetings,
>>>>>
>>>>> Wojtek
>>>>>
>>>>>
>>>>>
>>>>> Gideon Romm schrieb:
>>>>>> Do you have the following line in /etc/hosts.allow:
>>>>>>
>>>>>> nbdrootd: ALL: keepalive
>>>>>>
>>>>>> -Gadi
>>>>>>
>>>>>> On Fri, 2008-09-26 at 12:04 +0200, Wojtek Polcwiartek wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> we try to implement LTSP in pc-pool (about 200 thin clients) for 
>>>>>>> students at Tech.Univ. of Berlin (we are students too). The work is 
>>>>>>> almost done. We are now in the test phase. Here we got an error, witch 
>>>>>>> can stop our project :/ We use lt...@hardy.
>>>>>>> Our problem: The connection between nbd-client and ndb-server breaks.
>>>>>>>
>>>>>>> The message at the clients says (After switching to another terminal):
>>>>>>> "nbd0: Attempted to send on closed socket"
>>>>>>>
>>>>>>> The logs at the server:
>>>>>>> - Connection
>>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbdrootd[11882]: connect from 
>>>>>>> 130.149.10.132 (130.149.10.132)
>>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: connect from 
>>>>>>> 130.149.10.132, assigned file is /opt/ltsp/images/i386.img
>>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: Size of exported 
>>>>>>> file/device is 228229120
>>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbdrootd[11903]: connect from 
>>>>>>> 130.149.10.131 (130.149.10.131)
>>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: connect from 
>>>>>>> 130.149.10.131, assigned file is /opt/ltsp/images/i386.img
>>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: Size of exported 
>>>>>>> file/device is 228229120
>>>>>>>
>>>>>>> - Connection lost
>>>>>>> Sep 24 17:56:08 lts02 nbd_server[11883]: Read failed: Connection reset 
>>>>>>> by peer
>>>>>>> Sep 24 17:56:08 lts02 nbd_server[11904]: Read failed: Connection reset 
>>>>>>> by peer
>>>>>>>
>>>>>>>
>>>>>>> Do You have any idea why could it happen?
>>>>>>>
>>>>>>> What tcp-ports are needed for well-working LTSP? We opened 69 (tftp) 
>>>>>>> and 
>>>>>>> 2000 (nbd-server). Our network infrastructure works good: we couldn't 
>>>>>>> notice high-traffic time periods.
>>>>>>>
>>>>>>> Our H/W-Configuration:
>>>>>>> 2xServers (4x3GHz, 4GB Ram), H/W load balancer
>>>>>>> about 200x HP t5725, t5735 and t5525
>>>>>>>
>>>>>>>
>>>>>>> I already wrote an email about this error, but now I deliver some 
>>>>>>> details.
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>>
>>
> 
> 


-- 
Wojtek Polcwiartek

------
tubIT
TU-Berlin
Web   : www.tubit.tu-berlin.de
Email : [email protected]
Tel   : +49.30.314.28000

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_____________________________________________________________________
Ltsp-discuss mailing list.   To un-subscribe, or change prefs, goto:
      https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help,   try #ltsp channel on irc.freenode.net

Reply via email to