Re: [Ltsp-discuss] Failover and load balancing question....

David Johnston Wed, 25 Sep 2002 15:07:13 -0700

On Wed, 2002-09-25 at 05:19, Tom Lisjac wrote:
> 
> I'm new to the list and apologize if this question has been asked
> before. My searches have come up empty so I thought I'd ask here.
> 
> Over the Summer, I've set up an LTSP system with 10 terminals at a local
> elementary school.
> I'd like to set these labs up in other schools but the single point
> of failure and lack of scalability makes me nervous
>
> I'd like to add another identical P-III 450 (we have lots of these) as a
> load sharing and fail-over LTSP server. If one server goes down, I'd
> like the lab to simply slow down... not stop. I was wondering if anyone
> could point me at a preferred way of doing this? I have a few ideas but
> I'd rather not re-invent the wheel! :)
> 
> My current plan is to use the Linux-HA link to ping-pong the DHCP
> servers on the two machines. Each time a client boots, the active server
> would stop it's local DHCP server and pass control to the other box.
> This would effectively split the load. In the event of a failure, a
> heartbeat script could permanently enable the surviving machine's DHCP
> server. The failed box could then be removed and replaced without
> disrupting the classroom environment.
> 
> If there's an easier or better way of doing this, any suggestions would
> be greatly appreciated!


Tom,
I've done a fair amount of poking around on this issue; here are my
thoughts.

Regarding your idea of alternating DHCP servers:
The DHCP protocol is very lightweight; there are only four packets in a
negotiation (discover, offer, request, acknowledge).  However,
*starting* the server takes a fair amount of time.  My gut feeling,
utterly unbacked by any scientific investigation, is that dhcpd puts the
greatest load on CPU and memory during startup.  I would run dhpcd on
one machine, rsync the dhcpd.leases file from the server to the second
machine every hour or so, and use the HA heartbeat to start the second
machine's dhcpd whenever the primary failed.

Regarding failover:
The Linux-HA failover code works very well for servers like tftpd,
apache, dhcpd, and named that perform short-term transactions.  With
something to keep two machines' filesystems synchronized (rsync, AFS,
shared-scsi, etc), Linux-HA can transparently handle failover
gracefully.

However, Linux-HA still can't handle failing X clients gracefully,
particularly complex clients such as Grome or KDE.  You can round-robin
your clients, so that some get server1 and some get server2, but when a
server goes down it will take out the workstations using that server.

SOLUTIONS:
I might use Linux-ha with rsync for the transactional servers.  With
LTSP, most of the information coming from these servers is static, so
running rsync once a day is probably enough.

To divide the load, you can set up specialized servers.  One runs your
window managers, another runs all the browsers, another runs your office
suite.  In this setup, it's possible for one app to be unavailable while
everything else continues to work.

As an alternative, you can use the linux-ha heartbeat software to set up
a fallback server.  If the primary server goes down, the workstations
will all fail but they will be able to sign into the fallback server
almost immediately.  For this to work, you have to use something like
NAS so that loosing a server doesn't mean loosing access to the data. 
If the data are rapidly changing and critical, you can use AFS or
shared-scsi disks.

Does this help?

-David


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_____________________________________________________________________
Ltsp-discuss mailing list.   To un-subscribe, or change prefs, goto:
      https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help,   try #ltsp channel on irc.openprojects.net

Re: [Ltsp-discuss] Failover and load balancing question....

Reply via email to