Re: how to scale (was: how to do something at startup)

Mark Green Sun, 30 Sep 2007 19:51:19 -0700

On Sun, 2007-09-30 at 20:29 -0500, James Bennett wrote:
> On 9/30/07, Mark Green <[EMAIL PROTECTED]> wrote:
> > My question was really only about the former, a much simpler problem:
> > How to keep a tcp connection persistent and re-use it across requests?
> 
> By using a pooling connection manager external to Django. Again,
> complicating the application layer with too many details of the other
> layers in the stack seems -- to me, at least -- like a premature
> optimization that costs flexibility in the long run.
> 
> > While this overhead may be constant in most (not all!) scenarios
> > it's still a waste of resources that doesn't sit well with me.
> 
> You say "waste", I say "trade-off" ;)
> 
> And that's what web development is, really: a series of trades. The
> ability to "hot swap" front-end or back-end nodees by using pooling
> and load balancing external to Django is -- again, to me -- worth the
> trade of a slight increase in overhead, because it means you can bring
> additional nodes into or out of the pool without having to reconfigure
> the application layer.


I fully agree with your picture of the trade-off but I think connection
pooling neither contradicts hot-swapping nor does it introduce any kind
of application layer configuration.

What it does introduce would be additional code complexity, so imho the
argument should be about whether it's worth that or not.

In my opinion it should be, simply because if the protocol was meant to
be used like it is by django it would probably be using UDP transport
instead of TCP. The overhead of creating a new TCP connection,
potentially for each request, should imho not be underestimated.
Just try it on a host receiving 50 hits/sec and up...

> > I do understand (and endorse very much) that django is a shared nothing
> > architecture but imho that doesn't imply "zero internal persistence
> > across requests".
> 
> Keep in mind also that Django deliberately runs a bit closer to the
> bare HTTP than some of the heavyweight frameworks and that HTTP -- by
> design -- is utterly stateless. Again, it's a trade: inherently
> stateless architectures are tremendously easy to scale across virtual
> or physical machines, and I'd argue that's worth the use of external
> persistence mechanisms when that sort of thing is needed.

I'm not arguing that at all. I don't question the decision to not
internalize certain things, I only propose to make the interface to
these external things as efficient as possible.

There are a few good reasons why connection pooling towards the
database is common practice nowadays. I think one is because juggling
with an intense flux of tcp connections is quite expensive on some
architectures and/or databases, another would be the scenario I
mentioned: better control in overload situations.

> > Further problems arise when you need to integrate with a remote peer
> > that simply depends on persistent connections. My current candidate is
> > the spread toolkit (http://www.spread.org) but it's certainly not the
> > only piece of "environmental software" working that way.
> 
> There have been a couple people lately arguing about cases where
> Django isn't an appropriate solution, and so far I haven't really
> agreed with the examples put forth. But in this thread I think you've
> hit a genuine use case where Django probably isn't what you want: if
> you need high-performance networking with external services, I'd
> highly recommend Twisted[1] as the best Python option I'm aware of.

Yes and no. Yes, I agree django isn't for everything and No, I don't
really think we're trying to abuse it to that extend.

To elaborate a bit, spread is a messaging framework, like
activemq in the java world, only less broken. ;)

While it can indeed serve as the backbone for grid-style
applications we're only using it for light internode messaging
and a small "common tuplespace", to realize, for example:

- Synchronized list of logged in users across all nodes

- AJAX Chat

- Announcement of asynchronous events (e.g. backend processing in a 
  non-django process finishes) to the user

As you can see, our webapp is at heart really a webapp,
we're not trying to shoehorn django into being a computing
cluster or bittorrent client.

Messaging as opposed to say, polling a memcached or even the database,
means very real performance advantages to us.

So, I stand by my point; I think it would be nice if django spawned a
set of worker threads on startup, used them for ORM connection pooling
and offered a small API for the developer to take advantage of them.

I'll also happily contribute my little custom-thread.py for
inclusion if that helps but I somehow doubt the django guru's
couldn't do better in less time. ;)


-mark



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: how to scale (was: how to do something at startup)

Reply via email to