Re: Deployment Question

Graham Dumpleton Wed, 21 May 2008 17:53:23 -0700

On May 22, 5:20 am, Cliff Wells <[EMAIL PROTECTED]> wrote:
> > Sites that are amongst the largest on the internet fall into a corner
> > case in my mind.  As Mike pointed out, sites have an unrealistic
> > expectation of traffic.  I've been involved in the average cases.
>
> As have I.  But I'm going to disassemble this argument below.
>
> > My claims come from years in the service provider industry,
> > watching various deployments.  I've been an Apache fan for a long
> > time, and have seen and deployed hundreds of servers, serving
> > thousands of sites on Apache.
>
> I think this is true for all of us.  The difference is that the world
> has changed in the last couple of years and now there's more options to
> choose from.  And by "options" I don't mean "a smaller, less capable
> Apache clone", I mean a paradigm shift in how to handle high loads.
> It's well known that threaded/process based servers cannot scale beyond
> a reasonable point.  Nginx and Lighttpd are async and are specifically
> written to address the C10K problem.


There are two approaches one can use for addressing scalability, they
are vertical scaling and horizontal scaling.

In vertical scaling one just upgrades your existing single machine
with a bigger more capable machine. For this path then yes, nginx and
lighttpd may give your more head room than Apache. The problem with
vertical scaling is cost, plus that you will hit the limit of what the
hardware can achieve much sooner than with horizontal scaling.

With horizontal scaling you keep your existing machine and just add
more machines. For horizontal scaling, the limit is going to be how
easy it is to accommodate your application across a growing number of
machines. The scalability of Apache here isn't generally going to be
an issue as you would have sufficient machines to spread the load so
as to not unduly overload a single machine.

Although one is buying more hardware with horizontal scaling, the cost/
performance curve would generally increases at a lessor rate than with
vertical scaling. This however is tempered by increasing maintenance
costs from having to support multiple machines. If machines are all
identical though, and treated as an appliance which are either rebuilt
or replaced when a failure occurs, even that isn't really a problem.

Of course, there is still a whole lot more to it than that as you need
to consider power costs, networking costs for hardware/software load
balancing, failover and possible need for multiple data centres
distributed over different geographic locations.

So, what limitations exist and what other issues come into
consideration really depend on how you scale up your system.

One thing that keeps troubling me about some of the discussions here
as to what solution may be better than another is that it appear to
focus on what solution may be best for static file sharing or proxying
etc. One has to keep in mind that Python web applications have
different requirements than these use cases. Python web applications
also have different needs to PHP applications.

As I originally pointed out, for Python web applications, in general
any solution will do as it isn't the network or the web server
arrangement that will be the bottleneck. What does it matter if one
solution is twice as fast than another for a simple hello world
program, when the actual request time saved by that solution when
applied to a real world application is far less than 1% of overall
request time.

For non database Python web applications issues such as the GIL, and
how multithreading and/or multiple processes are used is going to be a
bigger concern and have more impact on performance. This is in as much
as running a single multithreaded process isn't going to cut it when
scaling. Thus ease of configuring use of multiple processes is more
important as is the ability to recycle processes to avoid issues with
increasing memory usage. There is a also the balance between having
fixed numbers of processes as is necessary when using fastcgi like
approaches, or the ability in something like Apache to dynamically
adjust the number of processes to handle requests. Add databases into
the mix and you get into a whole new bunch of issues, which others are
already discussing.

Memory usage in all of this is a big issue and granted that for static
file serving nginx and httpd will consume less memory. The difference
though for a dynamic Python web application isn't going to be that
marked. If you are running a 80MB Python web application process, it
is still going to be about that size whatever hosting solution you
use. This is because the memory usage is from the Python web
application, not the underlying web server. The problem is more to do
with how you manage the multiple instances of that 80MB process.

There have been discussions over on the Python WEB-SIG about making
WSGI better support asynchronous web servers. Part of their rational
was that it gave better scalability because it could handle more
concurrent requests and wouldn't be restricted by number of threads
being used. The problem that was pointed out to them which they then
didn't address is that where one is handling more concurrent requests,
the transient memory requirements of your process then theoretically
can be more. At least where you have a set number of threads you can
get a handle on what maximum memory usage may be by looking at the
maximum transient requirements of your worst request handler. With an
asynchronous model where theoretically an unbounded number of
concurrent requests could be handled at the same time, you could
really blow out your memory requirements if they all hit the same
memory hungry request handler at the same time. Thus a more
traditional synchronous model can thus give you more predictability,
which for large systems in itself can be an important consideration.

Anyway, this is getting a fair bit off topic and since others are
seeing my rambles as such, I'll try and refrain in future. :-)

Graham


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Deployment Question

Reply via email to