Re: Error handling / ev_set_syserr_cb

Nick Zavaritsky Thu, 25 Feb 2016 07:45:05 -0800

Hi,

>> 1) UB from the libev point of view, or
>> 2) works with certain versions of libev on certain OSes, but may break 
>> without warning, or
>> 3) is fully supported and is a part of the public API contract.
> 
> It's certainly 1 or 2.


I am glad it was stated in clear now.

Remember a discussion a while ago when a patch was submitted to work around a 
situation when libev runs out of file descriptors?

Back then you suggested long jump, but today you admit it is unreliable.


> Even if it were supported under some circumstances, I'd say whatever thing
> you are trying to do is illdesigned - you'd have to wrap every call, and
> libev is not the only source of errors - if libev runs out of memory for
> example, then the program might crash at any time (e.g. because of kernel
> OOM, or because it needs more stack space or…).

Let’s talk about memory. Indeed, recovering from a stack overflow is 
troublesome at the very best. However, with careful programming it is possible 
to avoid this kind of errors.

Today it became customary to omit checks for allocation failures in some 
projects. The rationale for doing so is that typically there is a plenty of RAM 
available + kernel overcommit resulting in allocation failure deferred until 
the presumably allocated memory is being accessed at which point OOM killer 
kicks in.

This is a reasonable programming model though not universal. You surely won’t 
assume that an embedded system has plenty of RAM. It is also possible to turn 
overcommit off.

Let’s consider networking or the file system next. It is the common 
understanding that errors will inevitably happen and must be handled. Luckily, 
it is not so hard to handle a socket or a file io error.

The case with the descriptors running out falls somewhere in-between.

If I was to consider adding resiliency to fd shortage in a library like libev I 
would have asked the following questions:

A) How common is such a situation?
B) Is it possible to avoid it by e.g. carefully controlling the resource usage?
C) Does a reasonable error recovery strategy exist and how hard it is to 
implement?

My answers:

A) Quite common. Default limit in Linux is quite low (~1000) and can be 
lowered. A heavy user of network connectivity is rather likely to hit the 
limit. Besides, this kind of a program is the one benefiting from the 
performance offered by libev most.

B) Not quite. Controlling fd usage manually is hard due to many existing 
libraries consuming descriptors internally. Besides, it is complex and feels 
awkward.

C) It depends. But generally, yes. Especially, in a network server, that is 
already prepared to handle connection errors.

> If you want to catch errors and do something sensible, the libev
> errorhandler is not your solution. In fact, no in-process solution exists
> - you should have a watchdog that does the right thing in fatal situations
> such as this.
> 
> Just saying... all this sounds like some inane customer feature request
> form a customer who doesn't know what he is doing and wants to go headlong
> through the nearest wall.

May be I am missing something and a peace of advice will be greatly appreciated.

We are spawning a new thread with a dedicated event loop. Sometimes it fails 
due to fd shortage when we setup ev_async used for communication with the 
thread. We would like to shutdown the thread in a clean way and to deliver an 
error into the procedure waiting for the thread’s completion. The procedure is 
fully prepared to handle the error (maybe it attempts to restart processing 
sans the dedicated thread, or it just throws the hands into the air.)

Regards,
Nick
_______________________________________________
libev mailing list
[email protected]
http://lists.schmorp.de/mailman/listinfo/libev

Re: Error handling / ev_set_syserr_cb

Reply via email to