Re: benchmarking various event loops with and without anyevent

Marc Lehmann Fri, 25 Apr 2008 23:31:08 -0700

On Sat, Apr 26, 2008 at 01:14:07AM -0400, Uri Guttman <[EMAIL PROTECTED]> wrote:
>   ML> Maybe I'll provide a backend for stem.
> 
> actually it makes more sense to me to wrap anyevent in stem. it already


of course, using anyevent always makes sense.

however, using anyevent doesn't solve the interoeprability problem: you
still cannot use modules using anyevent when you don't use the anyevent
but the wx interface, for example.

> has several event wrappers (pure perl, event.pm and tk) and wrapping is
> very easy to do. not much different than the code i see in anyevent.

Yes, but that doesn't give you much advantage - as long as your module
isn't good citizen and plays nicely with other modules by monopolising the
process it is not interoperable.

Making anyevent compatible to stem enables anyevent modules to be used in
stem. making stem use anyevent doesn't achieve that aslong as it doesn't
use it's anyevent backend, and it cnanot be used in other programs as
well, as it isn't event-loop agnostic because it forces the user to use
"stem" as the event loop.

Also, this would make it impossible to benchmark the pure perl event loop
- I would *prepdict* (but i am bad at that) that it will be the slowest
one, ignoring POE, which I expect to be much slower still.

>   ML> Especially in the important case of many handles/few active ones, an 
> approach
>   ML> of "scan all handles" is very slow.
> 
> my code doesn't scan all handle, but scan all writeable events to see if
> their handle bits are set.

That's even worse, as there can be many more events (watchers?) then file
descriptors (in the first benchmark you would scan, say, 10000 event
watchers for the same fd). even if not, it's slow O(n), compared to fast
O(n) that you can achieve with scanning the bitmasks.

It is only fast if you have a lot of handles and very few active watchers,
which isn't too typical (who uses all those extra fd's if not your
program?).

In any case, the "scan mask" approach is about many times faster in the
actual benchmark with 10000 handles and 100 active ones, simply because
scanning a mask is so much faster.

Besides, select returns the number of active handles, so one could use
both approachs and select between them, e.g. when more than 80% of the
handles are active, use the scan-all-handles method, otherwise the bitmask
method.

> descriptors increment and can cause bloat of the array if you have many
> in use (and not all of them in the event loop).

That is not a common case, besides, arrays are very compact, unlike
hashes, so it's not a clear win (note how the pure perl backend in
anyevent comes out as one of the backends using the leats amount of
memory).

In any case, a lot of technology in the kernel goes into providing "small
integers" as fd's, not taking advantage of that gives away optimisaiton
opportunities. In this case, unix guarantees that the memory use is
bounded (there is even a resource limit for it, and reserving 4 or 8
bytes/file descriptor is nothing really).

Trying to avoid bloat on that side is the wrogn side to optimise for.

> event loops are async in that you get callbacks as they are needed. sure

Yes, but the I/O isn't async, which was my point. asynchronous I/O is
quite a different beast, but few people really use it. (which is a pity,
but only pelr ahs a IMnsHO decent module for it).

> api are what matters. a better term may be non-blocking i/o (and socket

It's actually the only correct term, as no I/O is done in the event loop.

>   ML> Also, this does not explain at all why Event is so slow, and why Glib 
> scales
>   ML> so extremely badly. Most of the stuff that slows down the perl-based 
> event
>   ML> loop is clearly stuff that is much much faster in C.
> 
> poor memory management in the c code?

Perl's memory management is quite good, yeah. I do suspect that it
has somethign to do with glib scanning its watcher list (ALL of them)
repeatedly, and when removing, who knows, it might run a singly-linked
list to its end to remove the watcher.

As for Event, I think it simply does way too much around simple callback
invocation, for example it uses its event queue and adds events at the end
(walking the list each time). All that the event queue has done for me,
hwoever, was causing infite memory growth when it added multiple events
for the same fd again and again becasue some high-priority watcher got
precedence.

(I have nothing against event queues, you need one, but one *can* manage
it abysmally).

> a framework where memory management was fairly fast due to cached queues
> of known block sizes. alloc/free were usually a trivial call and the
> event code had it own private fixed size buffer queues. it had no
> problem doing 2k parallel web fetches including all the url parsing all
> on a single sparc. we had to throttle it down to keep up with the slower

hehe :)

>   ML> For a reasonable event loop implemented in C, the overhead of
>   ML> calling select should completely dominate the rest of the
>   ML> processing (it does so in EV).
> 
> true, but bad c (and perl) code is all around us.

hehe :/

>   ML> - how can I do one iteration of the event loop? This is important
> 
> i don't have an api for oneevent. i still haven't seen a need for
> it.

> hence having stem wrap anyevent may be the better way.

Its the easier way, because you don't have to provide the common interface
anyevent offers, but it is also far less useful - since Stem seems to be
like POE in that it dominates the process (it's me and nothing else),
there is liekly little pratical need to interoperability, but it sure
would be nice :)

> the goal for me is to have stem support more (and faster) event loops.
> if you are doing a stem app, you should use the stem event api.

*sigh*, I think the world would be a much better place if not every module
author emplyong events insisted on ruling out all others :)

>   ML>   for condvars: the function must not block after one or more events
>   ML>   have been handled.
> 
> ??

Some event models had bugs in that they:

1. handled some timer events
2. blocked waiting for some i/o events

This must not happen in the "one event loop iteration" function. It can
handle as many events as it likes, but when it handles at least one event,
it must not block.

>   ML> - how do I register a callback to be called when an fh or fd becomes
>   ML>   "readable" or "writable"?
> 
> the sessions examples show plenty of that. very similar to event.pm overall.

Actually, I tried, but I found the testsuite is much clearer on that.

>   ML> - how can I register a relative timer?
> 
> relative? is that a delay timer (relative to now)? event.pm has hard
> (timing from before callback to next trigger) and soft (time from after
> callback to next trigger) timers.

I found it, it only supports relative timers (but internally uses absolute
timing).

>   ML> - are there any limitations, for example, what happens when I register
>   ML>   two read watchers for one fh (or fd)?
> 
> shouldn't be a problem if select handles it. the stem perl event loop is
> a wrapper around select.

Well, the Tk backend doesn't handle it, unless I missed something, and the
lowest common denominator counts.

(Since AnyEvent prefers using its Tk backend this is not an issue unles sa
user would force the Stem backend).

>   ML> - how about child status changes or exits?
> 
> signals are signals. it handles sigchld and the stem::proc module deals
> with reaping as needed.

Well, signals are signals, but we are talking about waitpid here - how do
I get the info that stem::proc reaps in my sigchld handler?

>   ML> - how does stem handle time jumps, if at all?
> 
> ??

bios sets the time to 2010, ntp resets it to 2005, do stem programs then
sleep for 5 years? answer is: likely ses.

>   ML> - are it's timers based on absolute or wallclock time or 
> relative/monotonic time?
> 
> iirc it checks wallclock (HiRes::time()) after each call to select and
> gets its current delta from that. you can't trust a relative clock as it
> will skew around.

It is the othere way around, a relative clock does exactly the same thing
as your timers (you tell it to delay 5 seconds, it will delay 5 seconds),
an absoltue clock does not, for example, a user who registers a 5 second
delay with Stem might find out that it takes years to complete because it
uses wallclock time which can change, unlike for example,a true monotonic
clock, or some other time source that allows me to build relative timing.

But that is ok, as far as I know:

- libevent and Event::Lib use a monotonic clock ti gte relative timers
- Event has some crude checking for time jumps
- EV handles relative and absolute timers about perfectly
- all else just suffers from timejumps

So stem is in good company, its' a perfectionist issue (although each one
of my programs that didn't handle relative timers correctly got a bug
report in the past because of that, so it happens quite often, even on
servers).

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      [EMAIL PROTECTED]
      -=====/_/_//_/\_,_/ /_/\_\

Re: benchmarking various event loops with and without anyevent

Reply via email to