Hi,
On Monday 08 June 2009 20:11:02 Nathan Davis wrote:
> I recently came across the PyCon 2009 video of the Kamaelia presentation.
> I'm not too familiar with either Twisted or Kamaelia. Kamaelia seems to be
> an "improvement" over twisted, but I just want to see if my analysis is
> correct.
As some background to Kamaelia, about 7 years ago, I used to work with a
company producing large scale internet software (Inktomi), and had worked in
that field for about 5 years or so. That software was proprietary and built
using a reactor based model essentially. Also, that software was written in
C++, which adds an extra layer of complexity on top, but for obvious reasons.
Whilst I was there, due to large scale real world deployments being the best
test of any network system, I saw many real scenarios which made it clear
that whilst you can get a really good developer to come along and work with
that model, dealing with subtleties can be much much harder than people would
anticipate.
Also, you had this issue, which I suspect occurs in the vast majority of
companies:
* Company decides it wants to work on Project X
* Company assigns "best" developers[1] to work on Project X
[1] Best is a social metric in most companies, driven as much by
politics as anything else after all.
* Project X is delivered, Company decides to work on Project Y, "best"
developers reassigned.
* Developers picking up the pieces of Project X are left with an incomplete
understanding of the code, because they didn't write it. (Code being an
expression of thought of a problem solution that lacks the higher
overview)
* Code remains mainly in this maintenance phase. The simple fact that a
reactor model can be complex to work with can mean accretion of
misunderstandings and pain. This happens in any model, unless you focus
on trying to make maintenance easier.
However, the reality is this issue affects any concurrent system, unless
you try to make it hard to make certain classes of mistakes. ("Oh, I
have to lock that data structure before I use it? I have to set that
flag? Even for reading I need a lock? I didn't realise that?")
The key aspect is that for many reactor based systems effectively boil down to
being:
* Do this, and if this happens do that later or do that later.
Which is of course the model you have with state machines. Some reactor models
deal with better than others, but fundamentally, state machines represent a
model of thinking, but only capture the low level details of the thinking,
not the high level.
For the reactor model of programming, I view twisted as best of breed, having
seen a fair few (proprietary and open). The fact it's in python also helps
because it eliminates certain classes of pain. (Aside from anything else, by
trying to make some aspects of the higher level clearer/more obvious)
However, having seen real world pain caused by this model, and the fact
you do have to have a certain level of ability to use it, I hypothesised that
you could take a different approach to make it simpler.
I've also had a long term view that programming in general will be able to
attack harder problems if we have better tools.
> It seems to me that the major "contribution" of twisted is that it lets you
> do I/O asynchronously. More specifically, it provides a structured way to
> be notified of asynchronous I/O event via callbacks.
This is true. I think it's easy to underplay what it gives you, but this is
true. The reason behind this is because with network systems where you wish
to deal with large amounts of users, avoiding context switching through the
use of threads has been considered a good idea for a long while. (It's why
stackless's tasklets are popular, and why concurrency rich languages tend to
implement lightweight green threads as well as using OS threads & processes).
> Kamaelia builds on the ideas of twisted, but:
> 1. It uses (a somewhat limited form of) coroutines instead of events. In
> this way, it's similar to Stackless or Greenlets except it uses standard
> Python. The net result is an event-driven program that looks and feels
> like it's threaded (at least a little bit).
Correct. The recognition is that a limited coroutine in the form of a
generator *is* a statemachine in the same way as a collection of functions
that call each other as deferreds is a state machine. This link can be found
in the (naive) C++ "mini-axon" version of Kamaelia's core. (That demo code
creates a macros which allow you to create C++ classes which look like
python generators, but when passed through cpp become state machines).
The choice was taken to specifically use standard python, because the research
question was "can we make concurrency easier for the vast bulk of us to work
with?" (which I think we've answered to our satisfaction now is "yes" having
built multiple systems).
> 2. It has a strong component model. Anyway, Kamaelia makes it relatively
> easy to create components and connect them together.
Yes, this is a core aim in Kamaelia. Kamaelia's component model is based on 3
different approaches to componentisation:
* Unix pipelines
* Electronic systems & hardware description languages
* Occam (something I played with many moons ago)
Specifically all 3 of these make it simple (or simpler) to manage concurrency,
and all tend to end up with a focus on "reusable chunks of code" that happen
to communicate over "named things", and don't necessarily know who the
recipient of a message is.
This is of course what enables ls , cat, sed, awk, etc to be as composable in
interesting ways 30 years after they were made with little change, as they
are today - since this approach allows testing in complete isolation, as well
as unit & integration testing.
The hypothesis here of course is this:
* Many reactor based systems tend to end up have "a chunk of code
controlling file reading", "a chunk of code handing select", "a chunk of
code handling new connections", "a chunk of code watching the GUI", and
communicate through buffers which are not always explicit, and can know
about each other.
* Recognising that generators can be viewed as equivalent to a collection
of deferreds, and choosing to make those buffers explicit...
* ... the hypothesis is that you could make something that could
*potentially* be as efficient, but be a model that is more amenable
to maintenance and development by a wider set of people.
(I believe in making my life easier :-)
Having tested this model now for a while and built a number of different
systems with it, and specifically the involvement of (relatively) novice
developers as well as more experienced developers in programmes like Google
Summer of Code, I believe that this hypothesis to be relatively proven, so
the current project focus is really on:
* Consolidation of core systems, components and applications
* Optimisation
* Clean up
And just use, rather than research. (It really wasn't clear if it would be
a "more" accessible model when we started)
> This is something lacking in twisted?
I don't believe so. Twisted has evolved over time and does have a component
model. However, if you look in the twisted book, you won't find an explicit
reference to the model - the API docs for it are here:
http://twistedmatrix.com/documents/8.2.0/api/twisted.python.components.html
Specifically they build on Zope3's component model, which is a different sort
of component model. (Once you know this you'll find uses of the component
model in the book with classes named I<Something> )
>From a software/non-concurrent one, a more traditional one - Kamaelia's is
closer to the traditional Unix model.
(The two models aren't mutually exclusive by the way)
> So, is this analysis generally correct? Kamaelia is similar to twisted in
> that it is an asynchronous, event-driven framework at its core; but it
> hides the details a little better (and provides a more formal component
> model)?
Yes, I believe your analysis is correct.
Regarding "better" or "worse" I view that as a largely subjective term and
prefer "this works better for me" over any other judgement. I do notice we
tend to get more people saying this than not though :-)
As Gloria mentioned in her talk though, the question "it can't be that
simple" comes up, which is why the mini-axon tutorial exists, and having
done some recent tests, it should be possible to have some rather
substantial optimisations to Kamaelia's core (Axon), without changes
to Kamaelia applications.
My personal way of viewing things as a result, is that twisted's design is
very clearly designed with performance from the outset, in a way that matches
the original developer's thinking model, and then adding tools to make it
easier to work with.
Kamaelia's model is more based on enabling maintenance of code, to make it
easier to pick up something you've never seen before and to have a high level
roadmap to the code, and then follow through it to understand it, so that you
understand the impact of changes you make.
For example, whilst it's now more complex, the core of this code:
http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Kamaelia/Kamaelia/Experimental/PythonInterpreter.py#132
Should be still recognisable as being equivalent to this code:
import sys, traceback
def run_user_code(envdir):
source = raw_input(">>> ")
try:
exec source in envdir
except:
print "Exception in user code:"
print '-'*60
traceback.print_exc(file=sys.stdout)
print '-'*60
envdir = {}
while 1:
run_user_code(envdir)
And these examples:
http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Kamaelia/Examples/PythonInterpreter
such as this:
Pipeline(
Textbox(size = (800, 300), position = (100,380)),
InterpreterTransformer(),
TextDisplayer(size = (800, 300), position = (100,40)),
).run()
Give a clear idea of how the code is expected to be used - something doable
through the explicit nature of the decoupling and linkage rather than
implicit.
Having this formal component model therefore makes composition of interesting
systems simpler, and more explicit. Using python generators as our core unit
of concurrency and core component type encourages generally simpler/smaller
components which in turn lend themselves to reuse.
In retrospect, full blown co-routines is something I view would actually have
hampered the project in the first place because they would have encouraged a
use of larger components, which would perhaps have led to lower amounts of
reuse.
One other difference between Twisted and Kamaelia though. Twisted's core focus
has generally been network systems. Kamaelia's original use case was network
systems, but it's core focus is on general systems to be implemented
concurrently.
This means tools relating to transcoding TV or user uploaded images & videos
are as valid/appropriate tasks for Kamaelia as a greylisting mail server, an
IRC bot, a collaborative whiteboard, or a tool based on gesture recognition &
speech synthesis designed for teaching a child to read and write, along with
database modelling tools. (all things built with Kamaelia)
At the end of the day though, they're both just programming models. They
aren't mutually exclusive, and picking the right model for the right job is
more important than any other consideration :-)
Regards,
Michael.
--
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"kamaelia" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/kamaelia?hl=en
-~----------~----~----~----~------~----~------~--~---