There are several difficult issues connected with asynchronicity, high performace networking and connected things. I had to deal with them developing blip ( http://fawzi.github.com/ blip ). My goal with it was to have a good basis for my program dchem, and as consequence is not so optimized in particular for non recursive tasks, and it is D1, but I think that the issues are generally relevant.

i/o and asynchronicity is a very important aspect and one that will tend to "pollute" many parts of the library, and introduce dependencies that are difficult to remove thus those choices have to be done carefully.

Overview:
========

Threads vs fibers:
-----------------------

* an issue not yet brought up is that thread wire some memory, and so have an extra cost that fibers don't. * evaluation strategy of fibers can be chosen by the user, this is relevant for recursive tasks where each task spawns other tasks, different strategies (breadth first evaluation like threads uses a *lot* more resources than depth first, by having many more tasks concurrently in evaluation)

Otherwise the relevant points already  brought forth by others are:

- context switch of fibers (assuming that memory is active) is much faster - context switch are chosen by the user in fibers (cooperative multitasking), this allows one to choose the optima point to switch, but a "bad" fibers can ruin the response time the others. - d is not stackless (like Go for example), so each fiber needs to have enough space for the stack (something that often is not so easy to predict). This makes fiber still a bit costly if one really needs a lot of them. 64 bit can help here, because hopefully the active part is small, and it can be kept in RAM, even using a rather large virtual space. Still as correctly said by Brad for heavily uniform handling of many tasks manual management (and using stateless functions as much as possible) can be much more efficient.

Closures
------------
When possible and for the low level (often used) operations delegates and functions calls are a better solution than , structs and manual memory handling for "closures" are a good choice for low level operations, because one can avoid the heap allocation connected with the automatic closure. This approach cannot be avoided in D1, whereas D2 has the very useful closures, but at low level their cost should be avoided when possible. About using structs there are subtle issues that I think are connected with optimization of the compiler (I never really investigated them, I always changed the code, or resorted to heap allocation. The main issue is that one would like to optimize as much as possible, and to do it it normally assumes that the current thread is the only user of the stack. If you pass stack stored structures to other threads these assumptions aren't true anymore, so the memory of a stack allocated struct might be reused even before the function returns (unless I am mistaken and the ABI forbids it, in this case tell me).

Async i/o
----------

* almost always i/o is much slower than CPU, so an i/o operation is bound to make the cpu wait, so one wants to use the wait efficiently. - A very simple way is to just use blocking i/o, and just have other threads do other threads.
  - async i/o allows overlap of several operations in a single thread.
- for files an even more efficient way to communicate sharing of the buffer with the kernel (aio_*) - an important issue is avoiding waste of cpu cycles while waiting, to achieve this one can collect several waiting operations and use a single thread to wait on several of them, select, poll and epoll allow this, and increase the efficiency of several kinds of programs - libev and libevent are cross platform libraries that can help having an event based approach, taking care to check a large number of events and call a user defined callback when they happen in a robust cross platform way

locks, semaphores
------------
to synchronize between threads locks and semaphores are a standard way to synchronize. One has to be careful to mix them with fiber scheduling with locks, as one can easily deadlock.

Hardware informationy
-----------------------------
Efficient usage of computational resource depends also on being able to identify the available hardware. Don did quite some hacking to get useful information out of cpuinfo, but if one is interested in more complex computers more info would be nice.
I use hwloc for this purpose, it is cross plattform, can be embedded.

Possible solutions
==============

Having async i/o can be presented as normal synchronous (blocking) i/ o, but this makes sense only if one has several objects waiting, or uses fibers, and executes other fiber while waiting. How acceptable it is to rely (and thus introduce a dependency on) things like libev or hwloc? For my purposes using them was ok, and they are cross platform and embeddable, but is it true also for phobos?

Asynchronicity means being able to have work to be executed concurrently and then resynchronize at a later point. One can use processes (that also give memory protection), threads, or fibers to achieve this. If one uses just threads, then asynchronous i/o makes sense only with a fully manual (explicit) handling of it, hiding it away will be equivalent to blocking i/o. Fibers allow one to hide async io and make it look as blocking, but as Sean told there are issues with using fibers with D2 TLS. I kind of dislike the use of TSL for non low level infrastructure stuff, but that is just me around here it seems.

In blip I choose to go with fiber based switching.
I wrapped libev both at low level and at a higher level, in such a way than one can use them directly (for maximum performance) For the sockets I use non blocking calls, and a single "waiting" (io) thread, but hide them so that they are used just like blocking calls.

An important design decision if using fibers is if one should be able to have a "naked" thread, or hide the fiber scheduling in each thread. In blip I went for yes, because it is entirely realized as a normal library, but that gives some ugly corner cases when one uses a method that wants to suspend a thread that doesn't have scheduling place. Building the scheduling into all threads is probably cleaner if one goes with fibers. The problem of TSL and fibers remains though, especially if one allows the migration of fibers from one thread to the other (as I do in blip).

An important design choice in blip was being able to cope with recursive parallelism (typical of computation tasks), not just with the (server like) concurrent parallelism that is typical of servers. I feel that it is important, but is something that might not be seen as such by others.

To do
====
Now about async io the first step is for sure to expose an asynchronous API. This doesn't influence or depends on other parts of the library much.
An important decision if/which external libraries one can rely on.

Making the async API nicer to use, or even use it "behind the scenes" as I do in blip needs more complex choices on the basic handling of suspension and synchronization. Something like that is bound to be used in several parts of phobos so a careful choice is needed.

This parts are also partially connected with high performance networking (another GSoC project).

Fawzi

Fawzi

Reply via email to