On Wed, 2002-09-25 at 05:19, Tom Lisjac wrote: > > I'm new to the list and apologize if this question has been asked > before. My searches have come up empty so I thought I'd ask here. > > Over the Summer, I've set up an LTSP system with 10 terminals at a local > elementary school. > I'd like to set these labs up in other schools but the single point > of failure and lack of scalability makes me nervous > > I'd like to add another identical P-III 450 (we have lots of these) as a > load sharing and fail-over LTSP server. If one server goes down, I'd > like the lab to simply slow down... not stop. I was wondering if anyone > could point me at a preferred way of doing this? I have a few ideas but > I'd rather not re-invent the wheel! :) > > My current plan is to use the Linux-HA link to ping-pong the DHCP > servers on the two machines. Each time a client boots, the active server > would stop it's local DHCP server and pass control to the other box. > This would effectively split the load. In the event of a failure, a > heartbeat script could permanently enable the surviving machine's DHCP > server. The failed box could then be removed and replaced without > disrupting the classroom environment. > > If there's an easier or better way of doing this, any suggestions would > be greatly appreciated!
Tom, I've done a fair amount of poking around on this issue; here are my thoughts. Regarding your idea of alternating DHCP servers: The DHCP protocol is very lightweight; there are only four packets in a negotiation (discover, offer, request, acknowledge). However, *starting* the server takes a fair amount of time. My gut feeling, utterly unbacked by any scientific investigation, is that dhcpd puts the greatest load on CPU and memory during startup. I would run dhpcd on one machine, rsync the dhcpd.leases file from the server to the second machine every hour or so, and use the HA heartbeat to start the second machine's dhcpd whenever the primary failed. Regarding failover: The Linux-HA failover code works very well for servers like tftpd, apache, dhcpd, and named that perform short-term transactions. With something to keep two machines' filesystems synchronized (rsync, AFS, shared-scsi, etc), Linux-HA can transparently handle failover gracefully. However, Linux-HA still can't handle failing X clients gracefully, particularly complex clients such as Grome or KDE. You can round-robin your clients, so that some get server1 and some get server2, but when a server goes down it will take out the workstations using that server. SOLUTIONS: I might use Linux-ha with rsync for the transactional servers. With LTSP, most of the information coming from these servers is static, so running rsync once a day is probably enough. To divide the load, you can set up specialized servers. One runs your window managers, another runs all the browsers, another runs your office suite. In this setup, it's possible for one app to be unavailable while everything else continues to work. As an alternative, you can use the linux-ha heartbeat software to set up a fallback server. If the primary server goes down, the workstations will all fail but they will be able to sign into the fallback server almost immediately. For this to work, you have to use something like NAS so that loosing a server doesn't mean loosing access to the data. If the data are rapidly changing and critical, you can use AFS or shared-scsi disks. Does this help? -David ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _____________________________________________________________________ Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto: https://lists.sourceforge.net/lists/listinfo/ltsp-discuss For additional LTSP help, try #ltsp channel on irc.openprojects.net
