Re: [expert] Load Balancing & Round Robin

James Sparenberg Tue, 22 Jul 2003 11:52:29 -0700

On Tue, 2003-07-22 at 08:02, Jack Coates wrote:
> On Tue, 2003-07-22 at 07:30, Mark Watts wrote:
> > ...> >
> > > > If you use ldirectord with heartbeat to control the load balancing, you
> > > > just need to set persistent=120 (in the ldirectord.conf) to have a 2
> > > > minute persistancy window.
> > > >
> > > > Mark.
> > >
> > > That's fine if you're load-balancing for fault-tolerance; if you're
> > > load-balancing for reasons of load, it doesn't scale because of proxy
> > > servers, NAT, &c. You end up with bad balances, which is bad if one
> > > server ends up handling more than 50% of its capacity. And how many
> > > typical users are in and out of a site within two minutes, anyway? I
> > > read that whole conversation about increasing memory utilization of the
> > > LVS if persistence is kept too long. With RAM costing what it does these
> > > days, just get a few gigs and be done with it.
> > 
> > Why doesnt it scale? You don't have to do round-robin - there are several 
> > algorithms to choose from which give you different loading schemes.
> > 
> 
> It's nothing to do with algorithm, it's to do with chunk size. When a
> bunch of users are coming from behind a proxy and the load-balancer
> sends them all to one server, that server will be overloaded. Using
> source port in addition to IP can help, but then you run the risk of
> mis-assigning persistence. No big deal if a graphic request was sent to
> server B, but stateful goodies need to stay on A obviously. Granted, I
> haven't personally seen it happen since the CacheFlow was exciting new
> technology, but then I've only seen Foundries and Alteons used since
> then.
> 
> > - From personal experiance, you simply don't notice the front end directors if 
> > you use reasonable kit. The bottleneck is all in the webserver(s).
> > If you give the webservers more ram, they'll tend to cache stuff anyway so 
> > balancing the connections becomes less of a problem.
> > 
> 
> Are your webservers purely presentation perhaps? True three tier design,
> presentation > intelligence > storage == httpd > j2ee > sql? In that
> design the persistence is handled between the intelligence nodes and
> requests can come from either web server.
> 
> > When we hammered our HA search engine during performance testing, the 
> > director(s) didnt even bat an eyelid, and Apache on the webservers didn't 
> > really care to much either. We were mostly waiting on postgres...
> > (Admitidly our directors were a pair of Dell PE1650's with 1.4Ghz P3 procs and 
> > 512MB ram, and the webservers are 2650's with dual 2Ghz Xeons)
> > 
> 
> That's to be expected. Save the CPU horsepower for the database.
> 
> > The 120 sec value for persistancy was an example anyway, you can use whatever 
> > you like.
> > 
> > >
> > > List price is $1000 per port for a hardware SLB with cookie persistence
> > > support. If the money isn't there it isn't there, but a quick ebay shows
> > > used Foundries going for $300 per port.
> > 
> > What does cookie persistance do that LVS persistance doesn't?
> 
> LVS == map persistence by the SIP:SP > DIP:DP quad. So 10.1.1.1:30800 >
> 10.2.2.2:80 goes to A, but 10.1.1.2:1024 > 10.2.2.2:80 goes to B. If
> 10.1.1.1 is a squid proxy hiding 2500 users and 10.1.1.2 is a single
> workstation, one of two things will happen. One server will get hammered
> with the squid while the other sits idle, or the squid will get balanced
> using SP's and persistence will eventually break, unless the
> intelligence layer is handling it in some other fashion.
> 
> cookie == use the above to send the first session, then set a cookie on
> the browser. The SLB then watches for the cookie and directs to web
> servers based on its contents. This means that you're balancing by the
> browser rather than the SIP:SP, so it's much more even. Granted, one
> uesr may be doing more work than the other, but it's more likely to even
> out.


I can see the cookies working in an environment where users come in take
a reasonably sane amount of time and leave.  But what of the user who
does what most do.. browses .... leaves to do something else and comes
back later.  Take this scene.

Box A and B both with a max of 10 users (low numbers make for easier
pictures *grin*) now 20 people come in.  10 are given A cookies 10 are
given B cookies.  So far so good.  15 leave and don't come back.  5 with
cookies for A leave their browser open and do something else.  Now 12
more come in new 6 go to A 6 to B and at the same time the 5 people with
A cookies decide to go to the next window.  So here A has 11 and B 6 
with A overloaded.  The solution I guess would be to give the cookies a
short TTL so that the original remaining 5 get new cookies when they
restart.  In this case you've moved the session management from the load
balancer to the cookie manager.  I'd also be curious as to what happens
when the five with cookies for A come back and the box A has died. 
Maybe I'm slow... (I don't get much sleep of late could be the problem.)
but it doesn't look right somehow.  IF however it's working... no
problem.  

The only other thing I'm curious on is load levels.  I was talking with
some hardware oriented people last night one of the points was load
levels of CPU's and hardware.  One of them pointed out that he was doing
some research on underloaded servers and how it affects performance. 
Meaning too many servers and each one not having enough work to stay
within it's "power curve".  His point being that with a number of energy
saving features that are built into hardware these days if you drop the
load too low then the box is constantly having to "re-awaken" various
hardware components and it actually increases the time to respond (TTR)
to various events.  When most people see an increase in TTR then the
reaction is.. "We need more servers".  Which actually is counter
productive.  What he is advocating is a situation where if say you have
3 servers, 2 are active and 1 is in hot standby.  As long as load on the
first two is below 90% 3 stays inactive.  If the combined load on the
first 2 is below 90% of capacity of 1 box then you have 2 in hot standby
and 1 in active use.  The concept is to maintain the boxes either in a
state where hardware never tries to go to sleep, or totally asleep. 
Should be intresting.

James

Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Re: [expert] Load Balancing & Round Robin

Reply via email to