No locking, no preemptive multithreading [Re: Threading improvements]

Nicola Larosa Sun, 06 Apr 2008 00:02:09 -0700

Malcolm made great points about the locking problem (I collected them
below, for reference). I'd like to add some more context.


At the system level, the kernel uses preemption between processes, each
with a distinct memory space: that's a safe model (until they interact
via the file system, database or other means).

When you have multiple threads of execution in the same memory space, you
may use cooperation: the so called "green" threads, or coroutines. They
are available to Python in Stackless, and with PyPy's greenlets:

py.magic.greenlet: Lightweight concurrent programming
http://codespeak.net/py/dist/greenlet.html

However, preemption and shared memory are like nitric acid and glycerin:
when combined, they make something that is sometimes useful, but horribly
dangerous. Trying to paper over the problem by introducing locking only
makes it worse.

Unfortunately Python threads are preemptive, too.

Some measure of sanity is restored by the threadlocal module, already
used in Django, that gives each thread their copy of data, moving back to
the process model.

Another useful strategy makes threads only communicate via Queues.

Here are more details:

The problem with threads
http://blogs.sun.com/rvs/entry/the_problem_with_threads

I collected a few nuggets of wisdom, for your amusement:

http://www.teknico.net/misc/fortune/concurrency.en.txt

Sun (in Java) and Microsoft (in Windows) foisted this sorcerer's
apprentice stuff on the world, making a disservice to us all. It is up to
us to recognize it for what it is, and minimize its usage.

Please do yourself and the world a favor: don't use preemptive
multithreading for concurrency, if at all avoidable.


Malcolm Tredinnick wrote:
> I don't see the need for Django providing inter-process locking, since
> it's unnecessary complexity, in the scheme of things (and not always
> possible -- what if your views are running on different machines?). The
> current lock-free approach appears provably correct and seems useful,
> providing you have the correct database constraints in place.
> ...
> I don't think we need general locking primitives in Django because
> requiring them is something that can be designed around and usually
> makes the design stronger. Plus, getting locking primitives both
> functionally complete and correct is unbelievably hard (since you
> shouldn't rule out the most efficient ways of running websites:
> multiple processes and multiple machines). That's why I'm arguing in
> favour of generally lock-free algorithms that use the database as the
> synchronisation point.
> ...
> The very strong argument in designing "shared nothing" styles of
> architecture, particularly for web services is to enable proper
> scalability without penalising smaller cases with the overhead and
> without leaking abstractions into user code (proper locking management
> will absolutely require client code to call it, which leaks the whole
> locking and synchronisation structure into code that we should be
> trying to insulate).
> ...
> Don't rely on locking more than absolutely necessary. They don't go as
> far to say "design around it", since it's a paper on locking and they'd
> like to remain relevant, but designing lock-free algorithms (which is
> really the ideal solution here) is a much bigger area of computer
> science these days with distributed systems.
> ...
> The critical part of this pattern is how expensive the is_lock() call
> might be. Locking, even when there's no contention, on distributed
> systems -- processes or machines -- isn't free and is easy to starve
> for resources if you don't correctly release it and easy to deadlock if
> you don't do it in the right order (and adding deadlock detection to
> avoid that isn't free either).
> ...
> There's a very real risk with allowing locking to creep up into client
> code: the rest of the process becomes hostage to that code behaving
> correctly. If you take the lock and don't release it, or you take
> multiple locks in the wrong order, it's resource starvation time. And
> it's not just the request/response path that matters here, so whilst
> putting a lock reaper in the response path will be a good idea for any
> locking stuff, it isn't a panacea. Any scripts (e.g. cronjobs) also
> have to obey the rules and they can be run from anywhere. Locking
> primitives are a dangerous thing to add to large code bases.
>
> I firmly believe (which I think must be obvious from this post and my
> last one) that if we can avoid introducing the necessity for this, it
> will save everybody -- maintainers and third-party developers alike --
> a lot of trouble down the track. Locking done right means it works in
> all cases, otherwise, as Craig pointed out, it's a delusion. Solving
> these problems, though, doesn't necessarily require locking at a level
> above the database, so we aren't obliged to walk down that road.

-- 
Nicola Larosa - http://www.tekNico.net/

I've been bitten by [Javascript] mutable built-ins one too many times
to trust that in any language (and that leads to an interesting
disconnect, where Ruby people flock almost exclusively to Protoype -
which does fiddle with the built-ins - and Python people flock almost
exclusively to Mochikit or Dojo). -- James Bennett, June 2006

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

No locking, no preemptive multithreading [Re: Threading improvements]

Reply via email to