Hello, You could find the answer with further reading :P (in some older Email) This part of the solution works only 'sometimes'. IMO, the most robust way is to use this 3-solutions together.
>>> We found some way to solve this problem. It is not the most beautiful >>> solution, but it works for now. >>> We used a script in /etc/rc2.d/ with following lines: >>> nbd-client -d /dev/nbd0 >>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist >>> >>> I deconnects the nbd-client and connects it again with "persist" option. >>> Best regards, Wojtek Patrick Rady schrieb: > Thank you very much for your response! It is very helpful... > > With regard to 2. using the -persist option.... where do you do this? > > Regards, > > --Patrick > > > ----- Original Message ----- > From: "Wojtek Polcwiartek" <[email protected]> > To: [email protected] > Cc: "Patrick Rady" <[email protected]> > Sent: Friday, January 23, 2009 10:44:29 AM GMT -05:00 US/Canada Eastern > Subject: Re: [Ltsp-discuss] nbd-mounts lost: serious problem > > Hello, > > because of importance of these probĺems we have now 3 ways to protect > the clients from freeze because of loosing connections: > > 1. TCP-Keepalive tuning (the cleanest way) > /proc/sys/net/ipv4/tcp_keepalive_time = 600 > /proc/sys/net/ipv4/tcp_keepalive_intvl = 10 > /proc/sys/net/ipv4/tcp_keepalive_probes = 50 > > 2. Using 'nbd-client' with '-persist'-Option (helps sometimes when 1. fails) > > 3. Using 'cron' script, which checks every minute ... > if (the connection is lost) { > if (nobody uses that client){ > reboot / shutdown > } > } > Here you have to remember, that the programs 'reboot/shutdown/poweroff' > and their libs have to be cached, before the connection breaks > > Now it works fine: even if somebody does something stupid like turn off > a switch or disconnects a cable. > > Best regards, > > Wojtek > > > > Patrick Rady schrieb: >> I think we are running into an nbd problem much like you described on the >> LTSP list in November. >> >> If clients are idle for a period of time, they lose connection to the server. >> >> How did you tune TCP keepalive to fix this? >> >> --Patrick >> >> Patrick Rady >> Administrator, npServ >> NEW (Nonprofit Enterprise at Work) >> office 734-998-0160 ext. 212 / fax 734-998-0163 >> >> [email protected] / http://www.new.org/>>> We found some way to solve this >> problem. It is not the most beautiful >>> solution, but it works for now. >>> We used a script in /etc/rc2.d/ with following lines: >>> nbd-client -d /dev/nbd0 >>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist >>> >>> I deconnects the nbd-client and connects it again with "persist" option. >>> >> Ann Arbor Office: 1100 N. Main, Suite 100, Ann Arbor, MI 48104-1059 >> Detroit Office: Hannan House, 4750 Woodward Ave., Suite 308, Detroit, MI >> 48201 >> ================================== >> Finally! A solution for your nonprofit's tech support headaches. Visit >> www.new.org/npserv/ to learn more! >> >> ----- Original Message ----- >> From: "Wojtek Polcwiartek" <[email protected]> >> To: [email protected] >> Sent: Wednesday, November 5, 2008 3:16:43 AM GMT -05:00 US/Canada Eastern >> Subject: Re: [Ltsp-discuss] nbd-mounts lost: serious problem >> >> Hello, >> >> after 1 month we found the solution to our problem :D >> Problem (short): >> after some time clients lose their NBD-mounts (Log: "Read failed: >> Connection reset by peer") It is similar problem to >> https://bugs.launchpad.net/ubuntu/+source/nbd/+bug/113617 >> >> Solution: >> Tuning of the parameters of the TCP-Keepalive connection (see >> http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html) >> We suppose our network closes mount-connections. We use mostly >> enterprise-class network components (Cisco 6500 Series). >> >> Our LTSP system runs well. We wanted to share our experience. >> >> Greetings, >> Wojtek >> >> >> >> >> >> >> >> Wojtek Polcwiartek schrieb: >>> Hello, >>> >>> we think, that the problem is the load-balancer (Cisco ACE). Most of the >>> traffic on the servers goes through it. Sniffing showed some strange >>> RST-Tcp-Packets. >>> We found some way to solve this problem. It is not the most beautiful >>> solution, but it works for now. >>> We used a script in /etc/rc2.d/ with following lines: >>> nbd-client -d /dev/nbd0 >>> nbd-client <IP-Address> 2000 /dev/nbd0 -persist >>> >>> I deconnects the nbd-client and connects it again with "persist" option. >>> >>> Is there any reason, why the option "persist" isn't used by default? For >>> me the connection seems to be robuster then without it. >>> >>> Is there a clean way to change the parameters of the default nbd-connection? >>> >>> >>> Thanks for help! >>> >>> Wojtek >>> >>> >>> >>> >>> Gideon Romm schrieb: >>>> The only other thing I can think of is your switch. >>>> >>>> Is it a managed switch? Some switches will not allow a connection to be >>>> active and idle for an extended period of time. >>>> >>>> To test this, connect a single client to the LTSP server via crossover >>>> cable and let it sit for a day, and see if it disconnects, too. If it >>>> does not, then the problem is the switch, and you should figure out what >>>> setting in the switch [email protected] to be >>>> changed, or use a dumber switch. :) >>>> >>>> -Gideon >>>> >>>> >>>> On Tue, 2008-09-30 at 08:39 +0200, Wojtek Polcwiartek wrote: >>>>> Hello, >>>>> >>>>> yes, we do have this line in /etc/hosts.allow >>>>> We still work on this (wireshark etc.) :/ >>>>> Are other tcp-/udp-ports then 69 and 2000 needed? >>>>> Any other ideas? >>>>> >>>>> Greetings, >>>>> >>>>> Wojtek >>>>> >>>>> >>>>> >>>>> Gideon Romm schrieb: >>>>>> Do you have the following line in /etc/hosts.allow: >>>>>> >>>>>> nbdrootd: ALL: keepalive >>>>>> >>>>>> -Gadi >>>>>> >>>>>> On Fri, 2008-09-26 at 12:04 +0200, Wojtek Polcwiartek wrote: >>>>>>> Hello, >>>>>>> >>>>>>> we try to implement LTSP in pc-pool (about 200 thin clients) for >>>>>>> students at Tech.Univ. of Berlin (we are students too). The work is >>>>>>> almost done. We are now in the test phase. Here we got an error, witch >>>>>>> can stop our project :/ We use lt...@hardy. >>>>>>> Our problem: The connection between nbd-client and ndb-server breaks. >>>>>>> >>>>>>> The message at the clients says (After switching to another terminal): >>>>>>> "nbd0: Attempted to send on closed socket" >>>>>>> >>>>>>> The logs at the server: >>>>>>> - Connection >>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbdrootd[11882]: connect from >>>>>>> 130.149.10.132 (130.149.10.132) >>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: connect from >>>>>>> 130.149.10.132, assigned file is /opt/ltsp/images/i386.img >>>>>>> ./syslog:Sep 24 16:43:14 lts02 nbd_server[11883]: Size of exported >>>>>>> file/device is 228229120 >>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbdrootd[11903]: connect from >>>>>>> 130.149.10.131 (130.149.10.131) >>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: connect from >>>>>>> 130.149.10.131, assigned file is /opt/ltsp/images/i386.img >>>>>>> ./syslog:Sep 24 16:43:16 lts02 nbd_server[11904]: Size of exported >>>>>>> file/device is 228229120 >>>>>>> >>>>>>> - Connection lost >>>>>>> Sep 24 17:56:08 lts02 nbd_server[11883]: Read failed: Connection reset >>>>>>> by peer >>>>>>> Sep 24 17:56:08 lts02 nbd_server[11904]: Read failed: Connection reset >>>>>>> by peer >>>>>>> >>>>>>> >>>>>>> Do You have any idea why could it happen? >>>>>>> >>>>>>> What tcp-ports are needed for well-working LTSP? We opened 69 (tftp) >>>>>>> and >>>>>>> 2000 (nbd-server). Our network infrastructure works good: we couldn't >>>>>>> notice high-traffic time periods. >>>>>>> >>>>>>> Our H/W-Configuration: >>>>>>> 2xServers (4x3GHz, 4GB Ram), H/W load balancer >>>>>>> about 200x HP t5725, t5735 and t5525 >>>>>>> >>>>>>> >>>>>>> I already wrote an email about this error, but now I deliver some >>>>>>> details. >>>>>>> >>>>>>> >>>>>>> Thanks in advance! >>>>>>> >>>>>>> >> > > -- Wojtek Polcwiartek ------ tubIT TU-Berlin Web : www.tubit.tu-berlin.de Email : [email protected] Tel : +49.30.314.28000 ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _____________________________________________________________________ Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto: https://lists.sourceforge.net/lists/listinfo/ltsp-discuss For additional LTSP help, try #ltsp channel on irc.freenode.net
