On Fri, 2023-09-08 at 21:39 +0200, Marc Lehmann wrote:
> On Fri, Sep 08, 2023 at 02:32:46PM -0400, Olivier Langlois
> <oliv...@trillion01.com> wrote:
> > but beside that, I have a hard time figuring out what could cause a
> > segv into that small function...
> 
> Almost always, this is caused by race condiitons between threads. For
> example, starting or stopping watcheres from multiple threads without
> locking the loop.

I can imagine that it is the case but I doubt that this is my culprit.

My loop thread creates/deletes objects that contain async watcher that
is started/stopped at the creation/deletion.

Those objects are to implement the observer design pattern to have 2
thread cooperate.

The observer registration/unregistration is protected by a mutex.

> 
> > As a side note, I think that the assert text should be:
> > "libev: pipe_w not active, but pipe written"
> 
> We don't know if the pipe was written. What we do know is that the
> intent
> was that the pipe was not written, so the officially, the pipe was
> not
> written, but the watcher  is also not active.

thanks for the clarification. I guess that I will need to meditate on
the meaning... maybe it is the negations and the 'but' that is boggling
down my mind on it...

The assertion would fail if the code would end up in the code block and
the pipe watcher was not active... this would happen if evpipe_write()
was called (but technically the pipe not really written into)
> 
> > An exceptional occurence but theoritically possible is the
> > following:
> > 
> > ev_async_send() is called by another thread while the loop thread
> > is
> > processing pending watchers and one of these watchers is calling
> > ev_async_stop() on the pending async watcher...
> 
> While its probably a (semantic) bug to do so, it is harmless to call
> ev_async_send on a stopped watcher (other than losing the event, or
> getting a spurious event later). Stopping and freeing the watcher
> would be
> a different story, of course.

Any time ev_async_send() is called, there is no doubt that the watcher
is active.

The possible race condition is whether or not the async watcher will be
in pending state when stopped.

After studying libev code, I have concluded that such event is
harmless.

It took me few hours to rediscover that most variables that libev
functions are manipulating while they look like global variables are in
fact, loop data members that are all located in the heap.

this makes the memory corruption explanation more plausible.

-fsanitize=address appears to be my best tool to find my problem...

This is beyond libev discussion.. but incidentally, this observer class
has received a new timer watcher *this* week...

the timer watcher is rarely started... It should be stopped
inconditionnally before the observer gets freed but if somehow it was
not, I guess that this could create similar symptoms...

the funny thing is that if such timer watcher has been used prior the
crash, it was several hours before the crash but my traces are not
clearly showing when this watcher is started/stopped. Clearly something
that I can improve too...



_______________________________________________
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/mailman/listinfo/libev

Reply via email to