On May 22, 5:20 am, Cliff Wells <[EMAIL PROTECTED]> wrote: > > Sites that are amongst the largest on the internet fall into a corner > > case in my mind. As Mike pointed out, sites have an unrealistic > > expectation of traffic. I've been involved in the average cases. > > As have I. But I'm going to disassemble this argument below. > > > My claims come from years in the service provider industry, > > watching various deployments. I've been an Apache fan for a long > > time, and have seen and deployed hundreds of servers, serving > > thousands of sites on Apache. > > I think this is true for all of us. The difference is that the world > has changed in the last couple of years and now there's more options to > choose from. And by "options" I don't mean "a smaller, less capable > Apache clone", I mean a paradigm shift in how to handle high loads. > It's well known that threaded/process based servers cannot scale beyond > a reasonable point. Nginx and Lighttpd are async and are specifically > written to address the C10K problem.
There are two approaches one can use for addressing scalability, they are vertical scaling and horizontal scaling. In vertical scaling one just upgrades your existing single machine with a bigger more capable machine. For this path then yes, nginx and lighttpd may give your more head room than Apache. The problem with vertical scaling is cost, plus that you will hit the limit of what the hardware can achieve much sooner than with horizontal scaling. With horizontal scaling you keep your existing machine and just add more machines. For horizontal scaling, the limit is going to be how easy it is to accommodate your application across a growing number of machines. The scalability of Apache here isn't generally going to be an issue as you would have sufficient machines to spread the load so as to not unduly overload a single machine. Although one is buying more hardware with horizontal scaling, the cost/ performance curve would generally increases at a lessor rate than with vertical scaling. This however is tempered by increasing maintenance costs from having to support multiple machines. If machines are all identical though, and treated as an appliance which are either rebuilt or replaced when a failure occurs, even that isn't really a problem. Of course, there is still a whole lot more to it than that as you need to consider power costs, networking costs for hardware/software load balancing, failover and possible need for multiple data centres distributed over different geographic locations. So, what limitations exist and what other issues come into consideration really depend on how you scale up your system. One thing that keeps troubling me about some of the discussions here as to what solution may be better than another is that it appear to focus on what solution may be best for static file sharing or proxying etc. One has to keep in mind that Python web applications have different requirements than these use cases. Python web applications also have different needs to PHP applications. As I originally pointed out, for Python web applications, in general any solution will do as it isn't the network or the web server arrangement that will be the bottleneck. What does it matter if one solution is twice as fast than another for a simple hello world program, when the actual request time saved by that solution when applied to a real world application is far less than 1% of overall request time. For non database Python web applications issues such as the GIL, and how multithreading and/or multiple processes are used is going to be a bigger concern and have more impact on performance. This is in as much as running a single multithreaded process isn't going to cut it when scaling. Thus ease of configuring use of multiple processes is more important as is the ability to recycle processes to avoid issues with increasing memory usage. There is a also the balance between having fixed numbers of processes as is necessary when using fastcgi like approaches, or the ability in something like Apache to dynamically adjust the number of processes to handle requests. Add databases into the mix and you get into a whole new bunch of issues, which others are already discussing. Memory usage in all of this is a big issue and granted that for static file serving nginx and httpd will consume less memory. The difference though for a dynamic Python web application isn't going to be that marked. If you are running a 80MB Python web application process, it is still going to be about that size whatever hosting solution you use. This is because the memory usage is from the Python web application, not the underlying web server. The problem is more to do with how you manage the multiple instances of that 80MB process. There have been discussions over on the Python WEB-SIG about making WSGI better support asynchronous web servers. Part of their rational was that it gave better scalability because it could handle more concurrent requests and wouldn't be restricted by number of threads being used. The problem that was pointed out to them which they then didn't address is that where one is handling more concurrent requests, the transient memory requirements of your process then theoretically can be more. At least where you have a set number of threads you can get a handle on what maximum memory usage may be by looking at the maximum transient requirements of your worst request handler. With an asynchronous model where theoretically an unbounded number of concurrent requests could be handled at the same time, you could really blow out your memory requirements if they all hit the same memory hungry request handler at the same time. Thus a more traditional synchronous model can thus give you more predictability, which for large systems in itself can be an important consideration. Anyway, this is getting a fair bit off topic and since others are seeing my rambles as such, I'll try and refrain in future. :-) Graham --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---