Re: My Scalable Architecture using HAProxy

Pedro Mata-Mouros Thu, 03 Jan 2013 02:06:47 -0800

Also Kevin, I don't really know what's the database usage profile of your app, 
but I'd immediately rule out installing the DB on the web servers, especially 
having two MySQL instances on *each* machine that will be serving PHP...


Cheers,
Pedro.

On 3 Jan 2013, at 09:25, KT Walrus <[email protected]> wrote:

>> basically, you need persistence :)
> 
> Well, I only need persistence to optimize traffic flow so the correct 
> sessionDB is used (eliminating a network hop).  But, the system will still 
> function without persistence (in HAProxy) as the PHP code will know which 
> sessionDB it needs to use for a given user.  In this case, persistence can be 
> ensured by the PHP code even if HAProxy routes to a suboptimal initial 
> backend.
> 
> In the multiple DC case, I will lose persistence if one DC fails.  The 
> forwarded requests to the other DC will have to establish a new session in a 
> new sessionDB, but DC failure should be rare enough that I don't care about 
> this.  My site doesn't need 100% availability, just minimized user perceived 
> downtime of minutes rather than hours.
> 
> On Jan 3, 2013, at 3:49 AM, Baptiste <[email protected]> wrote:
> 
>> basically, you need persistence :)
>> 
>> On Thu, Jan 3, 2013 at 9:45 AM, KT Walrus <[email protected]> wrote:
>>> One more tweak…  I think the frontend LBs could be made to distribute the 
>>> load so that requests go to the backend that has the sessionDB that will be 
>>> used for the request rather than simple RR (by using cookies).  This would 
>>> keep most requests handled entirely by a single backend server.  I kind of 
>>> like this, from an efficiency and simplicity point of view.
>>> 
>>> Most setups seem to want you to place each individual component of the 
>>> backend (HAProxy, Nginx/Varnish, PHP, and MySQL) in separate VPSs (in a 
>>> "cloud" architecture).  But, I'm thinking that it will simplify things if I 
>>> don't use virtualization and have each backend capable of handling the 
>>> entire request.  If I need more capacity in the backend, I simply add 
>>> another backend server that functions independently of the other backends 
>>> (except for handling HA in times of high load where one backend forwards 
>>> the excess requests to its next neighbor backend).
>>> 
>>> I do have one problem in my proposed architecture.  A sessionDB could, 
>>> theoretically, get much more than MAXCONN connections (up to and including 
>>> all current requests could use a single sessionDB).  This is because once a 
>>> sessionDB is selected for an individual user, all subsequent request from 
>>> that user must be handled using this sessionDB.  This means I have to keep 
>>> MAXCONN low enough that if the sessionDB in the backend does have to handle 
>>> all requests to all backends, the server will still function and not be 
>>> overloaded.  It would be nice if this wasn't the case, but I can't think of 
>>> how to avoid this possibility.  If I could, I could probably set MAXCONN to 
>>> utilize 80% of the backend rather than a more conservative 50%, eventually, 
>>> saving significant money in scale out.
>>> 
>>> On Jan 3, 2013, at 2:56 AM, KT Walrus <[email protected]> wrote:
>>> 
>>>> Thanks for the reply.
>>>> 
>>>>> Why installing 2 layers of HAProxy???
>>>>> A single one (on the 2 servers is enough).
>>>> 
>>>> My thought was that the second layer of HAProxy would ensure that the 
>>>> individual backend server would never have more than MAXCONN requests so I 
>>>> know the server will never be overloaded possibly leading to the server 
>>>> going down or taking too long to process a request.
>>>> 
>>>> I want multiple active frontend lbs so that my architecture will scale 
>>>> infinitely to many more frontends if necessary.  If I  eventually needed 
>>>> more than 6 servers, I would set up another 6 servers (using the same 
>>>> setup at 2 data centers for additional HA.
>>>> 
>>>>> Since you're doing SSL, try
>>>>> to make it start multiple processes, a single one dedicated to HTTP
>>>>> and all other one for cyphering/deciphering processing…
>>>> 
>>>> Yes.  I planned on doing that.  My 2 frontend servers are UP (4 cores) 
>>>> while the 4 backend servers can be upgraded to DP (16 cores) and huge RAM 
>>>> (256GBs).  I've already purchased these servers.  I expect that 1 frontend 
>>>> server would be sufficient for a long time, but I want HA by having the 
>>>> two frontends on separate independent power/ethernet connections within 
>>>> the datacenter.
>>>> 
>>>>> I'm not a fan of first algo, unless you pay the resource per number of
>>>>> backend server, which is not your case.
>>>> 
>>>> I just thought "first" load balancing was perfect for "guarding" that an 
>>>> individual backend server never exceeded MAXCONN concurrent requests.  The 
>>>> overhead should be minimal since this "guard" HAProxy almost always will 
>>>> pass the request to localhost nginx/varnish.  I need this "guard" because 
>>>> there are multiple frontend LBs doing simple round robin to the backends 
>>>> independently.  This might become more of a possibility when and if I need 
>>>> more LBs independently distributing requests to the backends.
>>>> 
>>>>> Prefer using a hash in your case (even multiple hash with different
>>>>> backends and content switching), that way, your hit rate would be much
>>>>> better.
>>>> 
>>>> I'm not so concerned about individual hit rate as I am about HA and 
>>>> infinite scalability.  It is relatively cheap to add a new server to 
>>>> handle more backend or frontend load or split to placing some servers in a 
>>>> new datacenter.  I'd rather have my servers run at 50% capacity 
>>>> (purchasing twice the hardware) if that means increased HA from having the 
>>>> guard HAProxy's and never coming close to pushing them too hard that 
>>>> individual pieces of the software/hardware stack start to fail.
>>>> 
>>>>> no need to host a sorry page on a far away server, host it on your
>>>>> frontend LBs and HAProxy can deliver it once your server farm is
>>>>> full…
>>>> 
>>>> That is true.  I was really thinking that maybe the first Amazon 
>>>> "overflow" server might be set up to actually have a full backend server 
>>>> if the sorry page ever starts to be served by Amazon, I would simply 
>>>> create one or more EC2 servers to take the temporary load.  I actually 
>>>> plan on implementing the website as EC2 instances (using this 
>>>> architecture) until my Amazon bill goes over $500 a month at which time I 
>>>> would go colo.
>>>> 
>>>>> An other remark, it may be hard to troubleshoot such infra with 2
>>>>> Active/active LBs.
>>>> 
>>>> I think I have to deal with this, but since each LB is handling unique 
>>>> VIPs (unless keepalived kicks in due to failure), I don't think there is 
>>>> going to be that much trouble.
>>>> 
>>>>> And using DNS rr does not prevent you from using keepalived to ensure
>>>>> HA between your 2 HAProxys.
>>>> 
>>>> Yes.  I am hoping this is the case.  I eventually want at least two 
>>>> geographic locations (east and west coast data centers) so 4 IPs in the 
>>>> DNS to distribute to the closest datacenter.  I use DNSMadeEasy which can 
>>>> support both DNS Global Traffic Director (east coast and west coast IP 
>>>> Anycast) and DNS Failover (incase one datacenter goes offline).
>>>> 
>>>>> 
>>>>> cheers
>>>>> 
>>>>> 
>>>>> On Thu, Jan 3, 2013 at 12:20 AM, KT Walrus <[email protected]> wrote:
>>>>>> I'm setting up a new website in the next month or two.  Even though the 
>>>>>> traffic won't require a scalable HA website, I'm going to start out as 
>>>>>> if the website needs to support huge traffic so I can get some 
>>>>>> experience running such a website.
>>>>>> 
>>>>>> I'd like any feedback on what I am thinking of doing…
>>>>>> 
>>>>>> As for hardware, I am colocating 6 servers at this time and plan to use 
>>>>>> Amazon S3 to host the static files (which should grow quickly to 1TB or 
>>>>>> 2TB of mostly images).  2 of the servers are going to be my frontend 
>>>>>> load balancers running haproxy.  The remaining 4 servers with be 
>>>>>> nginx/varnish servers (nginx for the PHP/MySQL part of the site and 
>>>>>> varnish to cache the Amazon S3 files to save bandwidth charges by 
>>>>>> Amazon).
>>>>>> 
>>>>>> I plan on doing DNS load balancing using pairs of A records for each 
>>>>>> hosted domain that will point to each of my frontend haproxy load 
>>>>>> balancers.  Most traffic will be HTTPS, so I plan on having the frontend 
>>>>>> load balancers to handle the SSL (using the new haproxy support for SSL).
>>>>>> 
>>>>>> The two load balancers will proxy to the 4 backend servers.  These 4 
>>>>>> backend servers will run haproxy in front of nginx/varnish with load 
>>>>>> balancing of "first" and a suitable MAXCONN.  Server 1 haproxy will 
>>>>>> first route to the localhost nginx/varnish and when MAXCONN connections 
>>>>>> are active to the localhost, will forward the connection to Server 2 
>>>>>> haproxy.  Server 2 and 3 will be set up similarly to first route 
>>>>>> requests to localhost and when full, route subsequent requests to the 
>>>>>> next server.  Server 4 will route excess requests to a small Amazon EC2 
>>>>>> instance to return a "servers are all busy" page.  Hopefully, I will be 
>>>>>> able to add a 5th backend server at Amazon to handle the overload if it 
>>>>>> looks like I really do have traffic that will fill all 4 backend servers 
>>>>>> that I am colo'ing (I don't really expect this to ever be necessary).
>>>>>> 
>>>>>> Nginx will proxy to PHP on localhost and each localhost (of my 4 backend 
>>>>>> servers) will have 2 MySQL instances - one for the main Read-Only DB and 
>>>>>> one for a Read-Write SessionDB.  PHP will go directly to the main DB 
>>>>>> (not through HAProxy) and will use HAProxy to select the proper 
>>>>>> SessionDB to use (each user session must use the same SessionDB so the 
>>>>>> one a request needs might be on any of the backend servers).  Each 
>>>>>> SessionDB will be the master of one slave SessionDB on a different 
>>>>>> backend server for handling the failure of the master (haproxy will send 
>>>>>> requests to the slave SessionDB if the master is down or  failing).
>>>>>> 
>>>>>> So, each backend server will have haproxy to "first" balance HTTP to 
>>>>>> nginx/varnish.  The backends also have PHP and 3 instances of MySQL (one 
>>>>>> for mainDB, one for master sessionDB, and one for another backend's 
>>>>>> slave sessionDB).
>>>>>> 
>>>>>> Also, the 2 frontend servers will be running separate instances of 
>>>>>> haproxy.  I hope to use keepalived to route the VIPs for one frontend to 
>>>>>> the other frontend in case of failure.  Or, should I use heartbeat?  
>>>>>> There seems to be two HA solutions here.
>>>>>> 
>>>>>> I know this is a very long description of what I am thinking of doing 
>>>>>> and I thank you if you have read this far.  I'm looking for any comments 
>>>>>> on this setup.  Especially, any comments on using "first" load 
>>>>>> balancing/MAXCONN on the backend servers so that a request load balanced 
>>>>>> from the frontend will keep the backend servers from overloading 
>>>>>> (possibly bouncing a request from server 1 to server 2 to server 3 to 
>>>>>> server 4 to EC2 "server busy" server) are especially appreciated.  Also, 
>>>>>> any comments on using pairs of master/slave sessionDBs to provide high 
>>>>>> availability but still have session data saved/retrieved for a given 
>>>>>> user from the same DB are appreciated.  I believe this setup will allow 
>>>>>> the load to be distributed evenly over the 4 backends and only have the 
>>>>>> front end load balancers do simple round robin without session 
>>>>>> stickiness.
>>>>>> 
>>>>>> Kevin
>>>>>> 
>>>>>> 
>>>> 
>>> 
> 
>

Re: My Scalable Architecture using HAProxy

Reply via email to