On Wednesday 30 September 2009 20:56:59 erisian wrote:
> Now that I have the basic message passing and blocking policy working,
> I'm investigating how to integrate my little lib with asynchronous
> events in a (relatively) portable manner.
Hopefully what follows here is useful. I'm summarising the sorts of
things / thinking we followed with kamaelia.
> One of the most interesting and portable solutions seems to be
> 'libev'. It's a high-performance c-library that allows you to
> integrate asynchronous timer, device and other events into your
> application. It integrates with the operating system's select/poll
> subsystem and allows a lot of customization including the integration
> of co-routine subsystems (very similar to axon).
>
> Part of the trick is figuring out how to intersperse co-routine yields
> with async polls so that the whole system only blocks when all of the
> co-routines are WAITING, but then wakes up the co-routine scheduler
> when an async event puts any co-routine back into run mode. That has
> to be balanced with periodic checks with the async status so that
> components that require async feeds don't starve.
This is similar to one of the original use cases of Axon - which was to build
a network server system. As may expect this developed relatively organically
at first.
Rather than just answer "this is how we do it", I'll explain how we got there,
which is perhaps more interesting/useful. I don't know if it's a good fit for
you, but hopefully useful.
My initial approach was to create what we called a "PrimaryListenerSocket"
(PLS) component, which listened on a port, and used to select to check for
new connections. It would accept them and create a basic connected socket
adapter (CSA).
At that time the CSA was much more tightly bound to the protocol than it is
today, and the select loop was tied up in the PLS. Naturally at the same time
we built a basic component for handling client side aspects of the server. At
that point in time, due to limited functionality in python I asked a q on
comp.lang.python about how best to go about things:
You'll be able to see the outline of that code still in kamaelia today.
The next step was seeking to build out a proxy. My aim here was to evolve out
the necessary code from actual need, rather than from theory. Effectively
like TDD or BDD. The issue that arose from the above very simple design was
the fact that running 2 servers, or running clients and servers in the same
code would result in multiple concurrent calls to select. (ie select in one,
then another, then another)
That's clearly a bad design, but was expected. The design of a proxy led to
this issue:
* How does a TCPClient component know of the availability of something
that does select?
* How does a TCPServer reuse someone else's select?
After all, we didn't want to have components with information hardwired if
possible. (Open to the idea, but wanted to avoid it if possible)
This led to the realisation that Select itself was a thing that:
* Needed to be a component
* Needed to be findable in the wider system
Like many of the metaphors in kamaelia which are based on the real world, I
asked the question "How do I find such a thing?". To her credit, the answer
to this came from my _mum_ of all places. (Not because my mum knows how to
develop software, but because I was using organisations as a starting
metaphor, and those are familiar)
I explained the problem, and she simply turned round and said "you'd ask
someone wouldn't you?". The sort of person I'd ask at work is called a team
assistant, and I figured that this would be useful for tracking what we
decided to call services, and would effectively co-ordinate things and it was
a friday, this caused the creation of the co-ordinating assistant tracker
(aka CAT :-).
(To my mind this is an advantage - shifting the problem domain to a more human
one made it possible for a non-programmer to help solve what is essentially a
concurrency issue, without locking being part of the solution...)
This caused the creation of a fundamental concept in kamaelia - a service.
A service is a tuple consisting of a (component, inboxname) pair. This allows
someone to create a linkage to that and send messages to it.
This resulted in refactoring the TCP subsystem being changed to the structure
we have today, visually described here:
* http://www.kamaelia.org/Docs/NotationForVisualisingAxon
Textually though:
* TCPServer component. Creates a listening socket. Asks for access to a
selector service. If this doesn't exist, one is created along the way,
and returned. The TCPServer then sends a message to the Selector
saying "wake me when there's something for me to do".
After this, when the TCPServer gets a message saying there's activity on
the socket, this means there's a connection waiting so it accepts the
connection. It creates a CSA and passes the CSA the connected socket. It
also sends a message out to whomever created the TCPServer containing
the CSA.
* Connected Socket Adapter. This again is registered with the Selector,
and woken when there's something for it to do. This just reads/writes
data to/from in & outboxes.
* TCPClient essentially goes through the steps to create a connected
socket, and then hands off the mechanics of dealing with it to a CSA.
(This means all the things that can go wrong are handled in one place)
* The selector component waits for messages saying "NewReader/Writer" etc
or "RemoveReader/Writer". It creates an outbox to reply to the sender,
and stores a mapping of socket to "who owns the socket & how to talk to
them". Based on these messages it adds them to the set of sockets to
wait on in the select statement. When there's activity on them it sends
a message to the appropriate outbox to say there's something to do.
This lot has a few implications:
* It clearly delineates ownership of problems & stuff to do:
- accepting connections
- dealing with active connections
- making connections
- monitoring for activity
-> This ownership actually makes debugging significantly simpler :-)
* The selector component is really a connection monitor. The fact it uses
select is actually irrelevant. Clearly we could use epoll or similar
here. In particular, you could replace select here with libevent.
When this lot was originally written, these were all standard generator
components. The way that the system therefore worked was to eat CPU running
select in a non-blocking fashion. This is actually not a bad strategy if you
want to ensure your process eats CPU and has maximum responsiveness. It
starves all other processes on the system in a rather hostile fashion. In our
original usecase, this wasn't considered to be a problem particularly.
As a result the next step in the evolution of the system was for us to need to
put a TCPClient onto a Series 60 mobile phone. This was an issue because at
that point in time python for series 60 did not have a select statement. As a
result, the TCPClient was rewritten as a threaded TCPClient (Which is why
both exist). This created our first threaded component.
This threaded component was created such that it uses 2 threadsafe queues, and
a generator. The generator was a private part of the threaded component which
when scheduled by the scheduler would perform a similar task to the postman.
Specifically it would take messages from inboxes and place them into the
inbound Queue.Queue, and take messages from the outbound Queue.Queue, and
place them in the outbox. Such components can also sleep - either based on a
timeout waiting for an event, or just using time.sleep.
At this point in time we also still had an actual postman microprocess which
would visit the components registering linkages with it, checking for
deliveries and performing them.
We then had a problem which changed Kamaelia's dynamics/needs.
Specifically, we needed to be able to take data from a video data source, and
pipe it out through a command line encoder, and take the data back in and do
stuff with it.
That meant that in order to cope with throughput, the postman was optimised
out, as you've seen and we changed to a direct delivery model.
It also meant that the Selector component could no longer burn CPU. So the
questions arose:
* How to maintain responsiveness
* How not to eat CPU
That same work also made the whiteboard a usable application. (The whiteboard
was written around the same time as these changes were made)
The solution here was to put the Selector into a threadedcomponent, and allow
it to be called with a timeout that would cause it to sleep if there was
nothing to do. You'll see that this remains to this day here:
read_write_except = select.select(readers, writers, exceptionals,0.05)
Not the world's largest timeout, but not the smallest either. 20 calls per
second seems reasonable when nothing is happening. (This keeps the selector
responsive to a shutdown message)
At that point in time, it becomes necessary for the scheduler to be able to
sleep - which is fundamentally what you see in the scheduler to do with
Queue.Queue requests inside there.
The upshot of this is this:
* The TCPServer starts up, initialises a Selector, goes to sleep
* Selector starts up, goes to sleep waiting for stuff to happen
* Scheduler starts up, goes to sleep
Connection arrives.
* Selector's select call is interrupted, and results in listen socket
being ready.
* Selector sends message to TCPServer waking it up.
* The wake up to TCPServer causes Scheduler.wakeThread to be called,
waking the scheduler.
Despite 2 threads being used this is ensured to be threadsafe, due to using
threadsafe queues for comms, in a threadsafe manner.
Currently, we haven't done the specific logical next possible step which is to
optimise the /scheduler/, though a small amount of work is going on there
when I get the chance. You can see that work here:
* http://code.google.com/p/kamaelian/source/browse/sketches/Axon/
Unlike the current system this is an O(1) scheduler rather than round-robin.
It's more akin to the way a twisted reactor works, without being tied to a
particular waking mechanism. It's also designed for running Axon type things
rather than twisted type things (obviously). There are similarities for
anyone who knows both though.
The key advantage of this approach of doing O(1) is for performance reasons.
Some tests I've done of this are between 10 and 50 times faster than the
equivalent current kamaelia/axon approach. In a more or less code compatible
way.
Specifically, being round robin means kamaelia's scheduler's scaling with
processes is currently O(n), which whilst not as bad as threads, is less than
perfect. O(1) is much nicer, and on the "to be done" list :-)
The reason such performance enhancements aren't the top of my priority list
though is because kamaelia works fast enough for me at present, but if you're
designing a new system, it's probably useful to see.
It's possible that a problem at work I'm working on at the moment though may
require that change to be done. (My preferred way of prioritising kamaelia
work! :-)
Anyway, picking this apart, the things that I'll note here that we've found
useful when integrating with an event system:
* Wake on sender recieve
* Direct delivery
* Blocking code in a thread. (eg select with a timeout) This can aid
responsiveness and throughput.
* Allow the scheduler to sleep, and to be woken by threads.
* Being naive with threads initially is a good idea.
The other 4 key points I'd make:
* Today I'd rewrite the CAT to use the STM code directly. You'll note that
I do use the STM code in the Axon2 rewrite for this sort of purpose.
* Use optimistic concurrency all the way! :-) (STM + message passing rather
than locks)
* The concept of services - (component, inbox) - appears to be very
powerful in the same way that an IP domain socket which is an (IP, port)
pair is powerful.
* We currently default to being active, and explicitly do a self.pause()
before sleeping. I suspect based on what's been written that this may be
the wrong way round. (Perhaps simplifying code, and increasing
throughput) This is based on the O(1) scheduler work, which implies this.
That said, unlike pypes.org's work, and unlike filterpype, the current set up
does not limit our topologies to acyclic graphs. Indeed, we have pipelines
and graphlines of all kinds with cycles in, which turn out to be very useful.
(Would be very interesting to integrate with those two libraries though -
especially pypes)
Concepts I'm considering:
* Direct plumbing of Queue.Queues between components where Queue.Queues are
needed. (eliminate the internal generator, but at the risk of increasing
complexity)
* Allowing the use of Queue.Queues to implement ad-hoc load balancing in
the case of multiple consumers.
* Replacing the CAT's implementation with an STM store.
These aren't urgent from my view, because we have working code and working
systems, but would be better design decisions overall IMO.
> As soon as I decode the developer information on the co-routine
> integration I'll post back here with what I find.
Please do. I'm finding this really interesting & useful!
Best Regards,
Michael.
--
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"kamaelia" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/kamaelia?hl=en
-~----------~----~----~----~------~----~------~--~---