Re: [Libevent-users] announcing libev, towards a faster and more featureful libevent

Scott Lamb Mon, 05 Nov 2007 14:13:32 -0800

Marc Lehmann wrote:

On Sun, Nov 04, 2007 at 09:37:52PM -0800, Scott Lamb <[EMAIL PROTECTED]> wrote:

There is actually a lot of code that relies on this just working, and the
only other event loop I know that has a problem with this is Tk.

Ugh, I'd argue that idiom is broken. But if the support's free, I guess
it doesn't matter.


Well, the idiom is how it works in most languages nowadays, so wether broken
or not its important to support.

The main problem is that with the current libevent implementation, failures
are completely silent.

I've used garbage-collected languages without this problem. It's not thelanguage; it's the idiom.

This is true, but its an optional feature you don't have to use.  In case
you wonder, EV, the perl interface to libev, uses this feature.

It makes most sense when embedding, of course (not all the world is an .so
:).

Hmm, in your Perl example, I wouldn't rule out you wanting to share the
event loop with some C-based library and being unable to do so.


If you want to share it, you of course cannot do that unless that c-based
library uses the same version of the code.

But this is in no way difefrent to libevent: if one program links it
statically and other shared, or if one program uses 1.3d and another 1.3e
(different sonames), then you can't share it either.


The former is of course the fault of the linking program. The latter is
fixed (recently); libevent is now versioning properly.

There's definitely nothing you can't do with a void*, so this is all a
question of efficiency.


Its also a question of coding overhead, which is probably more important than
efficiency in this case.

Especially as efficieny is a red herring. in the libev proposed api,
callbacks get the watcher object passed, which

a) is often what you want anyways
b) is allocated by user code, so you cna just make the struct larger
   by appending your data.

this solves the efficiency issue and also makes for a simpler API.

In fact, I pondered long on wether I even add a void *arg to each event
watcher, but I thought the overhead of havign to "subclass" is large in the
common case where a single pointer indeed is enough.

In any case, let me repeat that this is an optional feature, not a
limitation, in the API.


It is an invitation for abuse (losing compatibility).

I assert that the cost of a sizeof(void*) to
point to the relevant part of your structure (which can be
nearby...still reasonable cache locality) is not too high.


Its higher than necessary, and more complex than neccessary.


Higher than necessary? I don't care; it's worth it.

More complex than necessary? Not really.

All callbacks will be called with EV_ERROR when an error occurs. And yes,
if you don't do error handlign and endlessly retry the same operation in a
loop, you run into problems.

But as that is an obvious programming bug, I don't see any problem here.

Hmm. Let me introduce a use case: an event-driven program which must not
fail. init or similar.

I worked on such a program recently. If it were unreliable, you would
have to send the system back to the factory for repair (i.e., flashing
new software). On ENOMEM, it would basically sleep and retry. This was
quite successful (memory could be temporarily consumed by network
buffers, etc, which cleared itself up after a while).


This could be solved easier with having a generic callback thats called in
such special conditions.

The alternative to have every event_add etc. call surrounded by checking
code is not realistic, especially if some of the code is not easily
changable (such as the libevent code itself which doesn't check for such
errors!).

For this program, it's important to know more than that an error has
occurred. EV_ERROR is totally inadequate.


You already know more: you know which watcher caused it, or which watcher is
responsible.


Erm, no. In the ENOMEM case, I know which fd you decided to kill, though
not actually that you've killed it. ENOMEM is not a per-file descriptor
problem.

What is my program supposed to do? It can't distinguish them, and the
correct behavior in each of these conditions is totally different. Also,
in the program I'm thinking of, "libev chose to kill this file
descriptor" probably means a network link just went down. Ergh.


killing the file descriptor might indeed be harsh. all that libev does now is
stop the event watcher and signal an error.


That's better...in the sense that "we've stopped murdering people; now
we only beat them severely" is better.

Besides, if you cannot malloc the few bytes ev_once requires you need a
*lot* of good error handlign code to continue sensibly.

Yes, as I've mentioned above, there are programs for which this level of
error handling is necessary.


I agree, but libevent already doesn't provide this, so its moot to compare
this case to libevent. However, it would be very interetsing to actualyl talk
about an API that solves this problem.

For example, on ENOMEM, one could have a callback thats called, because I cna
imagine that sleeping is a common situation.

However, for select, its quite likely that sleepign will not help ever,
so your example init program would simply freeze, as it cna reattempt the
select as many times as it wants but it won't change because many osses
issue enomem not just in response to real memory shortness but simply when
the number of fds grows high (in fact, its usually the only case where
this happens).

The default action could be to stop some watchers, while a callback might
decide to do something else.

What kind of error callbacks would you think are necessary?

I would identify the same cases we already have (fd_enomem, fd_badf) and
additionally enomem for any malloc-type error.

The event watcher would be signalled in any case, and the callback can decide
wether to use the default action or do its own stuff.

Would that be better?


I can't speak to "many oses", but for the particular one I was using
ENOMEM was truly a transient condition; there were global resources in
play that were likely to be freed after a while. (This was also poll()
rather than select().)

I'm looking at the Linux kernel source code now (core_sys_select), andyou are wrong on Linux as well:


338         if (size > sizeof(stack_fds) / 6) {

339 /* Not enough space in on-stack array; must usekmalloc */

340                 ret = -ENOMEM;
341                 bits = kmalloc(6 * size, GFP_KERNEL);
342                 if (!bits)
343                         goto out_nofds;
344         }

While porting libevent to our system (which actually in our caseinvolved redoing the API to match our conventions), we consideredintroducing an ENOMEM callback similar to the one you've mentioned. Itmight be useful, but it's nothing the caller can't do on its own byretrying the failed function.


Our API looked something like this:

int libname_eventloop_init(libname_eventloop_t **, constlibname_eventloop_attr_t *);


(note similarity to pthread API)

which allowed attributes specified at eventloop creation time(extensible without further API changes). One of them might have been

libname_eventloop_attr_setenomem(libname_eventloop_attr_t *, void(*enomem_callback)(/*undecided*/), void *enomem_callback_arg);

If the app cannot handle that, deep shit.

There are applications which can handle you returning ENOMEM to them on
select(), but basically no applications which can reliably handle you
randomly closing their file descriptors.


Ok, what will they do if the problem in select is that fd numbers used
wree too high? How can this sensibly be handled other than by aborting
and, say, restarting?

* any X11 application is screwed if you close its connection to the server


Uhm, no? :)


Uhm, yes.

* any server is going to assume nothing will happen to its listen socket


For realistic cases the listen socket won't be affected by enomem as it has a
lower number than the fds causing the problem.

Not necessarily. Servers do occasionally reload their listen sockets(SIGHUP handlers); they just don't have logic to do so on demand from acrazy library.


And in the case of ebadf, well, your listen socket is already gone
anyways, no harm done.

In any case, you can get the same behaviour as libevent by calling unloop
in case of an error, so the interface is strictly more powerful.

No, on error your library may have muddied the water already by screwing
with the file descriptors. libevent also makes errors clearer by simply
returning error from the failed function (I'm thinking of event_once()
vs ev_once() here.)


I wrote about this in my other mail.

I do think the design is useful in practise, where error handling is rarely
done to the utmost extent and sensible behaviour in dead-end situations
counts a lot.

There are a lot of sucky programs out there, but I think it's important
to not penalize the ones where it counts the most.


I don't think one should penalise most programs when a few programs can
still do what they need. I folow the motto "simple things should be
simple, complicated ones possible" - complicating error handlign for
everybody because a few rare apps might actually simplify is not a good
trade-off.

Many servers; on Linux, the NFS daemons; etc


Uhm, I somehow doubt that the kernel uses libevent internally (and the
kernel is where the nfs daemons are).

(There are userspace nfs daemons, but even if they use libevent, they are
far more broken and fragile than the standard nfs daemons in linux).

Please don't assume I'm a moron. I would not have said this if I hadn'tactually seen libevent-based NFS code that ships on real systems.

Specifically, I'm talking about the version of rpc.idmapd running bydefault on RHEL 5.0.

_______________________________________________
Libevent-users mailing list
Libevent-users@monkey.org
http://monkeymail.org/mailman/listinfo/libevent-users

Re: [Libevent-users] announcing libev, towards a faster and more featureful libevent

Reply via email to