Also Kevin, I don't really know what's the database usage profile of your app, but I'd immediately rule out installing the DB on the web servers, especially having two MySQL instances on *each* machine that will be serving PHP...
Cheers, Pedro. On 3 Jan 2013, at 09:25, KT Walrus <[email protected]> wrote: >> basically, you need persistence :) > > Well, I only need persistence to optimize traffic flow so the correct > sessionDB is used (eliminating a network hop). But, the system will still > function without persistence (in HAProxy) as the PHP code will know which > sessionDB it needs to use for a given user. In this case, persistence can be > ensured by the PHP code even if HAProxy routes to a suboptimal initial > backend. > > In the multiple DC case, I will lose persistence if one DC fails. The > forwarded requests to the other DC will have to establish a new session in a > new sessionDB, but DC failure should be rare enough that I don't care about > this. My site doesn't need 100% availability, just minimized user perceived > downtime of minutes rather than hours. > > On Jan 3, 2013, at 3:49 AM, Baptiste <[email protected]> wrote: > >> basically, you need persistence :) >> >> On Thu, Jan 3, 2013 at 9:45 AM, KT Walrus <[email protected]> wrote: >>> One more tweak… I think the frontend LBs could be made to distribute the >>> load so that requests go to the backend that has the sessionDB that will be >>> used for the request rather than simple RR (by using cookies). This would >>> keep most requests handled entirely by a single backend server. I kind of >>> like this, from an efficiency and simplicity point of view. >>> >>> Most setups seem to want you to place each individual component of the >>> backend (HAProxy, Nginx/Varnish, PHP, and MySQL) in separate VPSs (in a >>> "cloud" architecture). But, I'm thinking that it will simplify things if I >>> don't use virtualization and have each backend capable of handling the >>> entire request. If I need more capacity in the backend, I simply add >>> another backend server that functions independently of the other backends >>> (except for handling HA in times of high load where one backend forwards >>> the excess requests to its next neighbor backend). >>> >>> I do have one problem in my proposed architecture. A sessionDB could, >>> theoretically, get much more than MAXCONN connections (up to and including >>> all current requests could use a single sessionDB). This is because once a >>> sessionDB is selected for an individual user, all subsequent request from >>> that user must be handled using this sessionDB. This means I have to keep >>> MAXCONN low enough that if the sessionDB in the backend does have to handle >>> all requests to all backends, the server will still function and not be >>> overloaded. It would be nice if this wasn't the case, but I can't think of >>> how to avoid this possibility. If I could, I could probably set MAXCONN to >>> utilize 80% of the backend rather than a more conservative 50%, eventually, >>> saving significant money in scale out. >>> >>> On Jan 3, 2013, at 2:56 AM, KT Walrus <[email protected]> wrote: >>> >>>> Thanks for the reply. >>>> >>>>> Why installing 2 layers of HAProxy??? >>>>> A single one (on the 2 servers is enough). >>>> >>>> My thought was that the second layer of HAProxy would ensure that the >>>> individual backend server would never have more than MAXCONN requests so I >>>> know the server will never be overloaded possibly leading to the server >>>> going down or taking too long to process a request. >>>> >>>> I want multiple active frontend lbs so that my architecture will scale >>>> infinitely to many more frontends if necessary. If I eventually needed >>>> more than 6 servers, I would set up another 6 servers (using the same >>>> setup at 2 data centers for additional HA. >>>> >>>>> Since you're doing SSL, try >>>>> to make it start multiple processes, a single one dedicated to HTTP >>>>> and all other one for cyphering/deciphering processing… >>>> >>>> Yes. I planned on doing that. My 2 frontend servers are UP (4 cores) >>>> while the 4 backend servers can be upgraded to DP (16 cores) and huge RAM >>>> (256GBs). I've already purchased these servers. I expect that 1 frontend >>>> server would be sufficient for a long time, but I want HA by having the >>>> two frontends on separate independent power/ethernet connections within >>>> the datacenter. >>>> >>>>> I'm not a fan of first algo, unless you pay the resource per number of >>>>> backend server, which is not your case. >>>> >>>> I just thought "first" load balancing was perfect for "guarding" that an >>>> individual backend server never exceeded MAXCONN concurrent requests. The >>>> overhead should be minimal since this "guard" HAProxy almost always will >>>> pass the request to localhost nginx/varnish. I need this "guard" because >>>> there are multiple frontend LBs doing simple round robin to the backends >>>> independently. This might become more of a possibility when and if I need >>>> more LBs independently distributing requests to the backends. >>>> >>>>> Prefer using a hash in your case (even multiple hash with different >>>>> backends and content switching), that way, your hit rate would be much >>>>> better. >>>> >>>> I'm not so concerned about individual hit rate as I am about HA and >>>> infinite scalability. It is relatively cheap to add a new server to >>>> handle more backend or frontend load or split to placing some servers in a >>>> new datacenter. I'd rather have my servers run at 50% capacity >>>> (purchasing twice the hardware) if that means increased HA from having the >>>> guard HAProxy's and never coming close to pushing them too hard that >>>> individual pieces of the software/hardware stack start to fail. >>>> >>>>> no need to host a sorry page on a far away server, host it on your >>>>> frontend LBs and HAProxy can deliver it once your server farm is >>>>> full… >>>> >>>> That is true. I was really thinking that maybe the first Amazon >>>> "overflow" server might be set up to actually have a full backend server >>>> if the sorry page ever starts to be served by Amazon, I would simply >>>> create one or more EC2 servers to take the temporary load. I actually >>>> plan on implementing the website as EC2 instances (using this >>>> architecture) until my Amazon bill goes over $500 a month at which time I >>>> would go colo. >>>> >>>>> An other remark, it may be hard to troubleshoot such infra with 2 >>>>> Active/active LBs. >>>> >>>> I think I have to deal with this, but since each LB is handling unique >>>> VIPs (unless keepalived kicks in due to failure), I don't think there is >>>> going to be that much trouble. >>>> >>>>> And using DNS rr does not prevent you from using keepalived to ensure >>>>> HA between your 2 HAProxys. >>>> >>>> Yes. I am hoping this is the case. I eventually want at least two >>>> geographic locations (east and west coast data centers) so 4 IPs in the >>>> DNS to distribute to the closest datacenter. I use DNSMadeEasy which can >>>> support both DNS Global Traffic Director (east coast and west coast IP >>>> Anycast) and DNS Failover (incase one datacenter goes offline). >>>> >>>>> >>>>> cheers >>>>> >>>>> >>>>> On Thu, Jan 3, 2013 at 12:20 AM, KT Walrus <[email protected]> wrote: >>>>>> I'm setting up a new website in the next month or two. Even though the >>>>>> traffic won't require a scalable HA website, I'm going to start out as >>>>>> if the website needs to support huge traffic so I can get some >>>>>> experience running such a website. >>>>>> >>>>>> I'd like any feedback on what I am thinking of doing… >>>>>> >>>>>> As for hardware, I am colocating 6 servers at this time and plan to use >>>>>> Amazon S3 to host the static files (which should grow quickly to 1TB or >>>>>> 2TB of mostly images). 2 of the servers are going to be my frontend >>>>>> load balancers running haproxy. The remaining 4 servers with be >>>>>> nginx/varnish servers (nginx for the PHP/MySQL part of the site and >>>>>> varnish to cache the Amazon S3 files to save bandwidth charges by >>>>>> Amazon). >>>>>> >>>>>> I plan on doing DNS load balancing using pairs of A records for each >>>>>> hosted domain that will point to each of my frontend haproxy load >>>>>> balancers. Most traffic will be HTTPS, so I plan on having the frontend >>>>>> load balancers to handle the SSL (using the new haproxy support for SSL). >>>>>> >>>>>> The two load balancers will proxy to the 4 backend servers. These 4 >>>>>> backend servers will run haproxy in front of nginx/varnish with load >>>>>> balancing of "first" and a suitable MAXCONN. Server 1 haproxy will >>>>>> first route to the localhost nginx/varnish and when MAXCONN connections >>>>>> are active to the localhost, will forward the connection to Server 2 >>>>>> haproxy. Server 2 and 3 will be set up similarly to first route >>>>>> requests to localhost and when full, route subsequent requests to the >>>>>> next server. Server 4 will route excess requests to a small Amazon EC2 >>>>>> instance to return a "servers are all busy" page. Hopefully, I will be >>>>>> able to add a 5th backend server at Amazon to handle the overload if it >>>>>> looks like I really do have traffic that will fill all 4 backend servers >>>>>> that I am colo'ing (I don't really expect this to ever be necessary). >>>>>> >>>>>> Nginx will proxy to PHP on localhost and each localhost (of my 4 backend >>>>>> servers) will have 2 MySQL instances - one for the main Read-Only DB and >>>>>> one for a Read-Write SessionDB. PHP will go directly to the main DB >>>>>> (not through HAProxy) and will use HAProxy to select the proper >>>>>> SessionDB to use (each user session must use the same SessionDB so the >>>>>> one a request needs might be on any of the backend servers). Each >>>>>> SessionDB will be the master of one slave SessionDB on a different >>>>>> backend server for handling the failure of the master (haproxy will send >>>>>> requests to the slave SessionDB if the master is down or failing). >>>>>> >>>>>> So, each backend server will have haproxy to "first" balance HTTP to >>>>>> nginx/varnish. The backends also have PHP and 3 instances of MySQL (one >>>>>> for mainDB, one for master sessionDB, and one for another backend's >>>>>> slave sessionDB). >>>>>> >>>>>> Also, the 2 frontend servers will be running separate instances of >>>>>> haproxy. I hope to use keepalived to route the VIPs for one frontend to >>>>>> the other frontend in case of failure. Or, should I use heartbeat? >>>>>> There seems to be two HA solutions here. >>>>>> >>>>>> I know this is a very long description of what I am thinking of doing >>>>>> and I thank you if you have read this far. I'm looking for any comments >>>>>> on this setup. Especially, any comments on using "first" load >>>>>> balancing/MAXCONN on the backend servers so that a request load balanced >>>>>> from the frontend will keep the backend servers from overloading >>>>>> (possibly bouncing a request from server 1 to server 2 to server 3 to >>>>>> server 4 to EC2 "server busy" server) are especially appreciated. Also, >>>>>> any comments on using pairs of master/slave sessionDBs to provide high >>>>>> availability but still have session data saved/retrieved for a given >>>>>> user from the same DB are appreciated. I believe this setup will allow >>>>>> the load to be distributed evenly over the 4 backends and only have the >>>>>> front end load balancers do simple round robin without session >>>>>> stickiness. >>>>>> >>>>>> Kevin >>>>>> >>>>>> >>>> >>> > >

