Re: Deployment Question

Cliff Wells Thu, 22 May 2008 00:43:19 -0700


On Wed, 2008-05-21 at 17:53 -0700, Graham Dumpleton wrote:
> On May 22, 5:20 am, Cliff Wells <[EMAIL PROTECTED]> wrote:
> > I think this is true for all of us.  The difference is that the world
> > has changed in the last couple of years and now there's more options to
> > choose from.  And by "options" I don't mean "a smaller, less capable
> > Apache clone", I mean a paradigm shift in how to handle high loads.
> > It's well known that threaded/process based servers cannot scale beyond
> > a reasonable point.  Nginx and Lighttpd are async and are specifically
> > written to address the C10K problem.
> 
> There are two approaches one can use for addressing scalability, they
> are vertical scaling and horizontal scaling.
> 
> In vertical scaling one just upgrades your existing single machine
> with a bigger more capable machine. For this path then yes, nginx and
> lighttpd may give your more head room than Apache. The problem with
> vertical scaling is cost, plus that you will hit the limit of what the
> hardware can achieve much sooner than with horizontal scaling.


Except that vertical scaling doesn't preclude horizontal scaling, it
merely postpones the necessity for implementing it (if not the planning)
and helps limit the scope of it.  If Nginx provides superior vertical
scaling, then it will also provide superior horizontal scaling since
vertically scaled systems are the building blocks of a horizontally
scaled system.  

> With horizontal scaling you keep your existing machine and just add
> more machines. For horizontal scaling, the limit is going to be how
> easy it is to accommodate your application across a growing number of
> machines. The scalability of Apache here isn't generally going to be
> an issue as you would have sufficient machines to spread the load so
> as to not unduly overload a single machine.
> Although one is buying more hardware with horizontal scaling, the cost/
> performance curve would generally increases at a lessor rate than with
> vertical scaling. 

Again, I think this contrast is artificial.  You are setting up vertical
scaling and horizontal scaling as mutually exclusive when they are
anything but, and unless you have endlessly deep pockets, you should
prefer to control the growth of your horizontal scaling.

> Of course, there is still a whole lot more to it than that as you need
> to consider power costs, networking costs for hardware/software load
> balancing, failover and possible need for multiple data centres
> distributed over different geographic locations.

Absolutely.  And while hardware costs are dropping, hosting and power
costs are going up.  My colocation fees have increased an average of 10%
per year, and power fees have quadrupled since I started.  I don't
expect this trend to change any time soon.  

> One thing that keeps troubling me about some of the discussions here
> as to what solution may be better than another is that it appear to
> focus on what solution may be best for static file sharing or proxying
> etc. One has to keep in mind that Python web applications have
> different requirements than these use cases. Python web applications
> also have different needs to PHP applications.

Given that an average web page is probably 70% or more static or cached
content, I think this is a critical aspect.

> As I originally pointed out, for Python web applications, in general
> any solution will do as it isn't the network or the web server
> arrangement that will be the bottleneck. What does it matter if one
> solution is twice as fast than another for a simple hello world
> program, when the actual request time saved by that solution when
> applied to a real world application is far less than 1% of overall
> request time.

If you try to scale a dynamic application and are going to pass part of
the request off to Python on every request you are going to either fail
spectacularly or spend an awful lot of money scaling horizontally.
There's a reason people have successfully deployed huge Rails apps and
it's not often by having 300 servers.  They manage it by making sure
that Rails is only called when absolutely necessary and letting a fast
webserver handle most of the load.  

In any case, the same techniques are going to be applied regardless of
which web server you choose.  The question is more "how much of my
limited and expensive resources is this single part of my stack going to
consume and what benefit will I be getting for it?"  Unless you require
a specific module, Nginx and Apache are more-or-less functionally
equivalent, except that one uses a fraction of the resources of the
other.

> For non database Python web applications issues such as the GIL, and
> how multithreading and/or multiple processes are used is going to be a
> bigger concern and have more impact on performance. This is in as much
> as running a single multithreaded process isn't going to cut it when
> scaling. Thus ease of configuring use of multiple processes is more
> important as is the ability to recycle processes to avoid issues with
> increasing memory usage.

I'd consider "increasing memory usage" to be a bug in the application
and outside the scope of discussion.  As far as ease of configuring
multiple processes, I use Nginx's built-in load balancing and a 4 line
shell script to start my application.  Don't get me wrong, I think
Apache's process management is quite nice and I'd like to see something
similar added to Nginx, but it's hardly a show-stopper. 

>  There is a also the balance between having
> fixed numbers of processes as is necessary when using fastcgi like
> approaches, or the ability in something like Apache to dynamically
> adjust the number of processes to handle requests. 

Remember you said this (see below*).

> Add databases into
> the mix and you get into a whole new bunch of issues, which others are
> already discussing.
> Memory usage in all of this is a big issue and granted that for static
> file serving nginx and httpd will consume less memory. The difference
> though for a dynamic Python web application isn't going to be that
> marked. 

I disagree.  As I mentioned earlier, someone I know recently took an
Apache/mod_php application consuming 1.2GB of RAM down to 200MB using
Nginx/FastCGI with no loss in performance or functionality.  It's not
clear to me why a Python application would be much different.  

> If you are running a 80MB Python web application process, it
> is still going to be about that size whatever hosting solution you
> use. This is because the memory usage is from the Python web
> application, not the underlying web server. The problem is more to do
> with how you manage the multiple instances of that 80MB process.

Sort of.  However consider this: if I am running Nginx I can reasonably
*fill* a single server with Python processes and not worry too much
about how much memory Nginx consumes.  The resources are available for
running the *application* rather than the webserver.  Because the Python
application will undoubtedly be one of the first bottlenecks (database
next), the ability to horizontally scale the application (by running
multiple instances) is critical.  By using up system resources, Apache
limits the number of instances of the application that can be run on a
single machine, and by extension across multiple machines.

> There have been discussions over on the Python WEB-SIG about making
> WSGI better support asynchronous web servers. Part of their rational
> was that it gave better scalability because it could handle more
> concurrent requests and wouldn't be restricted by number of threads
> being used. The problem that was pointed out to them which they then
> didn't address is that where one is handling more concurrent requests,
> the transient memory requirements of your process then theoretically
> can be more.
 
> At least where you have a set number of threads you can
> get a handle on what maximum memory usage may be by looking at the
> maximum transient requirements of your worst request handler. 

Then you agree that dynamically adjusting the process pool size is bad
since it would have the same net effect?  This appears (to me) to
contradict what you claimed as a feature earlier [*].

> With an
> asynchronous model where theoretically an unbounded number of
> concurrent requests could be handled at the same time, you could
> really blow out your memory requirements if they all hit the same
> memory hungry request handler at the same time. Thus a more
> traditional synchronous model can thus give you more predictability,
> which for large systems in itself can be an important consideration.

Of course, this is where your earlier suggestion of using a hardware
load-balancer would be a good idea.  I think a much better use of
resources (read "money") would be spending some of it on a dedicated
load-balancing solution which can control how requests are distributed
rather than repurposing inefficiency into a feature.

At any rate, I don't actually think the above has much to do with Nginx
vs Apache as Pylons deployment options.  Because Pylons tends to be run
as a threaded app (is anyone doing otherwise?), we still have the same
predictability.  In fact our predictability is easier since we don't
need to calculate the cost of the web server's memory explosion in
addition to our application's needs.

In all of the above, I haven't seen any explanation from you as to why
Apache would be superior to Nginx as a deployment option, only that it
wouldn't be the worst bottleneck in your application stack.  Not
terribly convincing.  If we were discussing a closed-source solution
versus an open source solution, this might be sufficient ("good
enough"), but that's not the case here.

I'll give you a quick list of actual benefits I see from using Nginx:

1) low CPU overhead
2) small memory footprint
3) consistent latency for responses
4) scalable in all directions
5) simple and and syntactically consistent configuration

Benefits I see for Apache:

1) excellent documentation
2) wide array of modules, especially esoteric ones
3) mod_wsgi provides a slightly more efficient communication gateway to
Python backends
4) automatic process management (restarting backends)

Of Apache's benefits I see 

1) as mostly moot due to Nginx's simplicity
2) completely moot since I don't use them
3) not enough to overcome the efficiency lost elsewhere
4) as mostly moot because it's simple to solve in other ways

This probably doesn't exactly match other people's requirements and
certainly there are other considerations that might tip the scales one
way or the other.

> Anyway, this is getting a fair bit off topic and since others are
> seeing my rambles as such, I'll try and refrain in future. :-)

Please don't.  You happen to be one of the few feather-heads I don't
mind hearing from, even if I find your arguments kind of slippery ;-)
And incidentally, congrats on your baby =)

For people who care more about numbers than theoretical discussions (aka
"obstinate") please refer to the following which provides a fairly
decent overview of resource utilization between the two servers:

http://www.joeandmotorboat.com/2008/02/28/apache-vs-nginx-web-server-performance-deathmatch/


Regards,
Cliff


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Deployment Question

Reply via email to