Asynchronicity and more

Fawzi Mohamed Sat, 02 Apr 2011 17:25:56 -0700

There are several difficult issues connected with asynchronicity, highperformace networking and connected things.I had to deal with them developing blip ( http://fawzi.github.com/blip ).My goal with it was to have a good basis for my program dchem, and asconsequence is not so optimized in particular for non recursive tasks,and it is D1, but I think that the issues are generally relevant.

i/o and asynchronicity is a very important aspect and one that willtend to "pollute" many parts of the library, and introducedependencies that are difficult to remove thus those choices have tobe done carefully.


Overview:
========

Threads vs fibers:
-----------------------

* an issue not yet brought up is that thread wire some memory, and sohave an extra cost that fibers don't.* evaluation strategy of fibers can be chosen by the user, this isrelevant for recursive tasks where each taskspawns other tasks, different strategies (breadth first evaluationlike threads uses a *lot* more resourcesthan depth first, by having many more tasks concurrently inevaluation)


Otherwise the relevant points already  brought forth by others are:

- context switch of fibers (assuming that memory is active) is muchfaster- context switch are chosen by the user in fibers (cooperativemultitasking), this allowsone to choose the optima point to switch, but a "bad" fibers canruin the response time the others.- d is not stackless (like Go for example), so each fiber needs tohave enough space for the stack(something that often is not so easy to predict). This makes fiberstill a bit costly if one really needs a lot of them.64 bit can help here, because hopefully the active part is small,and it can be kept in RAM, even using a ratherlarge virtual space. Still as correctly said by Brad for heavilyuniform handling of many tasks manualmanagement (and using stateless functions as much as possible) canbe much more efficient.


Closures
------------

When possible and for the low level (often used) operations delegatesand functions calls are a better solution than , structs and manualmemory handling for "closures" are a good choice for low leveloperations, because one can avoid the heap allocation connected withthe automatic closure.This approach cannot be avoided in D1, whereas D2 has the very usefulclosures, but at low level their cost should be avoided when possible.About using structs there are subtle issues that I think are connectedwith optimization of the compiler (I never really investigated them, Ialways changed the code, or resorted to heap allocation.The main issue is that one would like to optimize as much as possible,and to do it it normally assumes that the current thread is the onlyuser of the stack. If you pass stack stored structures to otherthreads these assumptions aren't true anymore, so the memory of astack allocated struct might be reused even before the functionreturns (unless I am mistaken and the ABI forbids it, in this casetell me).


Async i/o
----------

* almost always i/o is much slower than CPU, so an i/o operation isbound to make the cpu wait, so one wants to use the wait efficiently.- A very simple way is to just use blocking i/o, and just haveother threads do other threads.

  - async i/o allows overlap of several operations in a single thread.

- for files an even more efficient way to communicate sharing ofthe buffer with the kernel (aio_*)- an important issue is avoiding waste of cpu cycles while waiting,to achieve this one can collect several waiting operations and use asingle thread to wait on several of them, select, poll and epoll allowthis, and increase the efficiency of several kinds of programs- libev and libevent are cross platform libraries that can helphaving an event based approach, taking care to check a large number ofevents and call a user defined callback when they happen in a robustcross platform way


locks, semaphores
------------

to synchronize between threads locks and semaphores are a standard wayto synchronize.One has to be careful to mix them with fiber scheduling with locks, asone can easily deadlock.


Hardware informationy
-----------------------------

Efficient usage of computational resource depends also on being ableto identify the available hardware.Don did quite some hacking to get useful information out of cpuinfo,but if one is interested in more complex computers more info would benice.

I use hwloc for this purpose, it is cross plattform, can be embedded.

Possible solutions
==============

Having async i/o can be presented as normal synchronous (blocking) i/o, but this makes sense only if one has several objects waiting, oruses fibers, and executes other fiber while waiting.How acceptable it is to rely (and thus introduce a dependency on)things like libev or hwloc?For my purposes using them was ok, and they are cross platform andembeddable, but is it true also for phobos?

Asynchronicity means being able to have work to be executedconcurrently and then resynchronize at a later point.One can use processes (that also give memory protection), threads, orfibers to achieve this.If one uses just threads, then asynchronous i/o makes sense only witha fully manual (explicit) handling of it, hiding it away will beequivalent to blocking i/o.Fibers allow one to hide async io and make it look as blocking, but asSean told there are issues with using fibers with D2 TLS.I kind of dislike the use of TSL for non low level infrastructurestuff, but that is just me around here it seems.


In blip I choose to go with fiber based switching.

I wrapped libev both at low level and at a higher level, in such a waythan one can use them directly (for maximum performance)For the sockets I use non blocking calls, and a single "waiting" (io)thread, but hide them so that they are used just like blocking calls.

An important design decision if using fibers is if one should be ableto have a "naked" thread, or hide the fiber scheduling in each thread.In blip I went for yes, because it is entirely realized as a normallibrary, but that gives some ugly corner cases when one uses a methodthat wants to suspend a thread that doesn't have scheduling place.Building the scheduling into all threads is probably cleaner if onegoes with fibers.The problem of TSL and fibers remains though, especially if one allowsthe migration of fibers from one thread to the other (as I do in blip).

An important design choice in blip was being able to cope withrecursive parallelism (typical of computation tasks), not just withthe (server like) concurrent parallelism that is typical of servers.I feel that it is important, but is something that might not be seenas such by others.


To do
====

Now about async io the first step is for sure to expose anasynchronous API. This doesn't influence or depends on other parts ofthe library much.

An important decision if/which external libraries one can rely on.

Making the async API nicer to use, or even use it "behind the scenes"as I do in blip needs more complex choices on the basic handling ofsuspension and synchronization.Something like that is bound to be used in several parts of phobos soa careful choice is needed.

This parts are also partially connected with high performancenetworking (another GSoC project).


Fawzi

Fawzi

Asynchronicity and more

Reply via email to