[kamaelia-list] Re: Porting Axon Architecture to D

Michael Sparks Sun, 04 Oct 2009 10:25:56 -0700

On Wednesday 30 September 2009 20:56:59 erisian wrote:
> Now that I have the basic message passing and blocking policy working,
> I'm investigating how to integrate my little lib with asynchronous
> events in a (relatively) portable manner.


Hopefully what follows here is useful. I'm summarising the sorts of
things / thinking we followed with kamaelia.

> One of the most interesting and portable solutions seems to be
> 'libev'.  It's a high-performance c-library that allows you to
> integrate asynchronous timer, device and other events into your
> application.  It integrates with the operating system's select/poll
> subsystem and allows a lot of customization including the integration
> of co-routine subsystems (very similar to axon).
> 
> Part of the trick is figuring out how to intersperse co-routine yields
>  with async polls so that the whole system only blocks when all of the
> co-routines are WAITING, but then wakes up the co-routine scheduler
> when an async event puts any co-routine back into run mode.  That has
> to be balanced with periodic checks with the async status so that
> components that require async feeds don't starve.

This is similar to one of the original use cases of Axon - which was to build
a network server system. As may expect this developed relatively organically
at first.

Rather than just answer "this is how we do it", I'll explain how we got there, 
which is perhaps more interesting/useful. I don't know if it's a good fit for 
you, but hopefully useful.

My initial approach was to create what we called a "PrimaryListenerSocket"
(PLS) component, which listened on a port, and used to select to check for
new connections. It would accept them and create a basic connected socket
adapter (CSA).

At that time the CSA was much more tightly bound to the protocol than it is 
today, and the select loop was tied up in the PLS. Naturally at the same time 
we built a basic component for handling client side aspects of the server. At 
that point in time, due to limited functionality in python I asked a q on 
comp.lang.python about how best to go about things:

You'll be able to see the outline of that code still in kamaelia today.

The next step was seeking to build out a proxy. My aim here was to evolve out 
the necessary code from actual need, rather than from theory. Effectively 
like TDD or BDD. The issue that arose from the above very simple design was 
the fact that running 2 servers, or running clients and servers in the same 
code would result in multiple concurrent calls to select. (ie select in one, 
then another, then another)

That's clearly a bad design, but was expected. The design of a proxy led to 
this issue:

    * How does a TCPClient component know of the availability of something
      that does select?
    * How does a TCPServer reuse someone else's select?

After all, we didn't want to have components with information hardwired if 
possible. (Open to the idea, but wanted to avoid it if possible)

This led to the realisation that Select itself was a thing that:
    * Needed to be a component
    * Needed to be findable in the wider system

Like many of the metaphors in kamaelia which are based on the real world, I 
asked the question "How do I find such a thing?". To her credit, the answer 
to this came from my _mum_ of all places. (Not because my mum knows how to 
develop software, but because I was using organisations as a starting 
metaphor, and those are familiar)

I explained the problem, and she simply turned round and said "you'd ask 
someone wouldn't you?". The sort of person I'd ask at work is called a team 
assistant, and I figured that this would be useful for tracking what we 
decided to call services, and would effectively co-ordinate things and it was 
a friday, this caused the creation of the co-ordinating assistant tracker 
(aka CAT :-).

(To my mind this is an advantage - shifting the problem domain to a more human 
one made it possible for a non-programmer to help solve what is essentially a 
concurrency issue, without locking being part of the solution...)

This caused the creation of a fundamental concept in kamaelia - a service.

A service is a tuple consisting of a (component, inboxname) pair. This allows 
someone to create a linkage to that and send messages to it.

This resulted in refactoring the TCP subsystem being changed to the structure 
we have today, visually described here:
    * http://www.kamaelia.org/Docs/NotationForVisualisingAxon

Textually though:
    * TCPServer component. Creates a listening socket. Asks for access to a
      selector service. If this doesn't exist, one is created along the way,
      and returned. The TCPServer then sends a message to the Selector
      saying "wake me when there's something for me to do".
      After this, when the TCPServer gets a message saying there's activity on
      the socket, this means there's a connection waiting so it accepts the
      connection. It creates a CSA and passes the CSA the connected socket. It
      also sends a message out to whomever created the TCPServer containing
      the CSA.

    * Connected Socket Adapter. This again is registered with the Selector,
      and woken when there's something for it to do. This just reads/writes
      data to/from in & outboxes.

    * TCPClient essentially goes through the steps to create a connected
      socket, and then hands off the mechanics of dealing with it to a CSA.
      (This means all the things that can go wrong are handled in one place)

    * The selector component waits for messages saying "NewReader/Writer" etc
      or "RemoveReader/Writer". It creates an outbox to reply to the sender,
      and stores a mapping of socket to "who owns the socket & how to talk to
      them". Based on these messages it adds them to the set of sockets to
      wait on in the select statement. When there's activity on them it sends
      a message to the appropriate outbox to say there's something to do.

This lot has a few implications:

   * It clearly delineates ownership of problems & stuff to do:
       - accepting connections
       - dealing with active connections
       - making connections
       - monitoring for activity
       -> This ownership actually makes debugging significantly simpler :-)

   * The selector component is really a connection monitor. The fact it uses
      select is actually irrelevant. Clearly we could use epoll or similar
      here. In particular, you could replace select here with libevent.

When this lot was originally written, these were all standard generator 
components. The way that the system therefore worked was to eat CPU running 
select in a non-blocking fashion. This is actually not a bad strategy if you 
want to ensure your process eats CPU and has maximum responsiveness. It 
starves all other processes on the system in a rather hostile fashion. In our 
original usecase, this wasn't considered to be a problem particularly.

As a result the next step in the evolution of the system was for us to need to 
put a TCPClient onto a Series 60 mobile phone. This was an issue because at 
that point in time python for series 60 did not have a select statement. As a 
result, the TCPClient was rewritten as a threaded TCPClient (Which is why 
both exist). This created our first threaded component. 

This threaded component was created such that it uses 2 threadsafe queues, and 
a generator. The generator was a private part of the threaded component which 
when scheduled by the scheduler would perform a similar task to the postman. 
Specifically it would take messages from inboxes and place them into the 
inbound Queue.Queue, and take messages from the outbound Queue.Queue, and 
place them in the outbox. Such components can also sleep - either based on a 
timeout waiting for an event, or just using time.sleep.

At this point in time we also still had an actual postman microprocess which 
would visit the components registering linkages with it, checking for 
deliveries and performing them.

We then had a problem which changed Kamaelia's dynamics/needs.

Specifically, we needed to be able to take data from a video data source, and 
pipe it out through a command line encoder, and take the data back in and do 
stuff with it.

That meant that in order to cope with throughput, the postman was optimised 
out, as you've seen and we changed to a direct delivery model.

It also meant that the Selector component could no longer burn CPU. So the  
questions arose:
   * How to maintain responsiveness
   * How not to eat CPU

That same work also made the whiteboard a usable application. (The whiteboard 
was written around the same time as these changes were made)

The solution here was to put the Selector into a threadedcomponent, and allow 
it to be called with a timeout that would cause it to sleep if there was 
nothing to do. You'll see that this remains to this day here:
    read_write_except = select.select(readers, writers, exceptionals,0.05)

Not the world's largest timeout, but not the smallest either. 20 calls per 
second seems reasonable when nothing is happening. (This keeps the selector 
responsive to a shutdown message)

At that point in time, it becomes necessary for the scheduler to be able to 
sleep - which is fundamentally what you see in the scheduler to do with 
Queue.Queue requests inside there.

The upshot of this is this:
    * The TCPServer starts up, initialises a Selector, goes to sleep
    * Selector starts up, goes to sleep waiting for stuff to happen
    * Scheduler starts up, goes to sleep

Connection arrives.
    * Selector's select call is interrupted, and results in listen socket
      being ready.
    * Selector sends message to TCPServer waking it up.
    * The wake up to TCPServer causes Scheduler.wakeThread to be called,
       waking the scheduler.

Despite 2 threads being used this is ensured to be threadsafe, due to using 
threadsafe queues for comms, in a threadsafe manner.

Currently, we haven't done the specific logical next possible step which is to 
optimise the /scheduler/, though a small amount of work is going on there 
when I get the chance. You can see that work here:
    * http://code.google.com/p/kamaelian/source/browse/sketches/Axon/

Unlike the current system this is an O(1) scheduler rather than round-robin. 
It's more akin to the way a twisted reactor works, without being tied to a 
particular waking mechanism. It's also designed for running Axon type things 
rather than twisted type things (obviously). There are similarities for 
anyone who knows both though.

The key advantage of this approach of doing O(1) is for performance reasons. 
Some tests I've done of this are between 10 and 50 times faster than the 
equivalent current kamaelia/axon approach. In a more or less code compatible 
way.

Specifically, being round robin means kamaelia's scheduler's scaling with 
processes is currently O(n), which whilst not as bad as threads, is less than 
perfect. O(1) is much nicer, and on the "to be done" list :-)

The reason such performance enhancements aren't the top of my priority list 
though is because kamaelia works fast enough for me at present, but if you're 
designing a new system, it's probably useful to see.

It's possible that a problem at work I'm working on at the moment though may 
require that change to be done. (My preferred way of prioritising kamaelia 
work! :-)

Anyway, picking this apart, the things that I'll note here that we've found 
useful when integrating with an event system:

    * Wake on sender recieve
    * Direct delivery
    * Blocking code in a thread. (eg select with a timeout) This can aid
       responsiveness and throughput.
    * Allow the scheduler to sleep, and to be woken by threads.
    * Being naive with threads initially is a good idea.

The other 4 key points I'd make:
   * Today I'd rewrite the CAT to use the STM code directly. You'll note that
      I do use the STM code in the Axon2 rewrite for this sort of purpose.
   * Use optimistic concurrency all the way! :-) (STM + message passing rather
      than locks)
   * The concept of services - (component, inbox) - appears to be very
     powerful in the same way that an IP domain socket which is an (IP, port)
     pair is powerful.
   * We currently default to being active, and explicitly do a self.pause()
     before sleeping. I suspect based on what's been written that this may be
     the wrong way round. (Perhaps simplifying code, and increasing
     throughput) This is based on the O(1) scheduler work, which implies this.

That said, unlike pypes.org's work, and unlike filterpype, the current set up 
does not limit our topologies to acyclic graphs. Indeed, we have pipelines 
and graphlines of all kinds with cycles in, which turn out to be very useful.
(Would be very interesting to integrate with those two libraries though - 
especially pypes)

Concepts I'm considering:
   * Direct plumbing of Queue.Queues between components where Queue.Queues are
     needed. (eliminate the internal generator, but at the risk of increasing
     complexity)
   * Allowing the use of Queue.Queues to implement ad-hoc load balancing in
      the case of multiple consumers.
   * Replacing the CAT's implementation with an STM store.

These aren't urgent from my view, because we have working code and working 
systems, but would be better design decisions overall IMO.

> As soon as I decode the developer information on the co-routine
> integration I'll post back here with what I find.

Please do. I'm finding this really interesting & useful!

Best Regards,


Michael.
-- 
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"kamaelia" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~----------~----~----~----~------~----~------~--~---

[kamaelia-list] Re: Porting Axon Architecture to D

Reply via email to