Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT

2015-11-05 Thread Jonathan de Boyne Pollard

On 2015-11-05 01:29, Guillermo wrote:

So problem solved. Thank you, this is a major improvement.


Brill.  Thank you for the confirmation.



Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT

2015-11-04 Thread Guillermo
2015-07-06 22:01 GMT-03:00 Guillermo:
>
> I wanted to try out nosh daemontools-style, i.e. using
> service-dt-scanner and launching service-manager "by other means"
> [...]
> However, under certain circumstances service-dt-scanner receives a
> SIGABRT signal and dies:

2015-07-12 9:45 GMT-03:00 Jonathan de Boyne Pollard:
>
> Remember that I said that my immediate suspicion is a (fourth) libkqueue
> bug?  It's a fourth libkqueue bug.
> [...]
> As the inotify(7) manual page says, if an event is larger than the buffer
> size given to read(), it fails with EINVAL.  And events can be larger than
> sizeof(struct notify_event).  libkqueue doesn't deal with this failure
> properly, leading to a call to abort():

... and using nosh version 1.22, the one that no longer depends on
libkqueue, service-dt-scanner doesn't die anymore. Also, modifying the
scan directory (e.g. placing a symbolic link to a bundle or service
directory with daemontools layout, or directly copying it) correctly
triggers a scan, which previously didn't work very well for me either.

http://www.mail-archive.com/supervision%40list.skarnet.org/msg01043.html

So problem solved. Thank you, this is a major improvement.
G.


Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT

2015-07-14 Thread Jonathan de Boyne Pollard

Guillermo:
Have you ever considered just writing a BSD backend using kqueue / 
kevent, and a Linux backend using epoll / inotify or whatever native 
Linux mechanism is available to suit nosh's needs? I believe you 
mentioned there's quite a few conditional compilation already to work 
around libkqueue issues, and you also already have `uname` = FreeBSD 
and `uname` = Linux tests in the .do files.


That's really the reverse of the direction that I'd like to go.  I'd 
like to have less of this particular conditional compilation, not more.  
The aim is for the libkqueue bugs to be fixed, so that I can then take 
out the idiosyncratic Linux code.


Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT

2015-07-12 Thread Jonathan de Boyne Pollard

Guillermo:

Jonathan de Boyne Pollard:

If there's no error output, crank up strace and see what the last few system 
calls are.  It's probably worthwhile doing that anyway, in fact.

[...]

a read() call on the file descriptor returned by the inotify_init() that 
produces an EINVAL error, followed rt_sigprocmask() with a SIG_UNBLOCK 
argument, and the tgkill() that sends the SIGABRT.


Remember that I said that my immediate suspicion is a (fourth) libkqueue 
bug?  It's a fourth libkqueue bug.


And it's here:

* https://github.com/mheily/libkqueue/blob/master/src/linux/vnode.c#l70

As the inotify(7) manual page says, if an event is larger than the 
buffer size given to read(), it fails with EINVAL.  And events can be 
larger than sizeof(struct notify_event).  libkqueue doesn't deal with 
this failure properly, leading to a call to abort():


* https://github.com/mheily/libkqueue/blob/master/src/linux/platform.c#l181

nosh code never calls abort(), never calls raise(SIGABRT), and would 
have printed some kind of message if an unhandled exception had led to 
an abort being raised by the C++ library.


The output that you are seeing from service-dt-scanner is because of a 
spurious wakeup.


* https://github.com/mheily/libkqueue/blob/master/src/linux/platform.c#l199

You can turn these debug messages on with the KQUEUE_DEBUG=1 environment 
variable (and compiling the library in debug mode), apparently.


* https://github.com/mheily/libkqueue/blob/master/src/common/kqueue.c#l68

libkqueue is receiving events from inotify that the caller of kevent() 
isn't actually interested in, resulting in a spurious wakeup from the 
call to kevent() with no actual event to report. The output to standard 
error is a minor bug in service-dt-scanner, because it assumes that 
every time that it is woken up and kevent() returns successfully there 
will be at least one event.  It's finding nonsense in the event buffer 
and printing out a debug message when it ignores the nonsense.  This is 
fixed in version 1.18, but this isn't really the cause of your problem 
here.  It's just distracting log noise.


The problem here is that inotify is waking kevent() up because you 
listed the directory.  I suspect this change in your version of 
libkqueue, at first glance:


* 
https://github.com/mheily/libkqueue/commit/e41cc259a0318b0e7925521d0fe3bc7433971ace


After the spurious wakeup, there is another second event enqueued by the 
kernel, that is bigger than sizeof(struct notify_event). Whether that's 
an uninteresting event too, and whether it is also caused by your 
listing the directory, is unknown.  libkqueue isn't passing a buffer big 
enough to read it so that we see what it is, and is abort()ing when the 
kernel returns an error because the read buffer is too small.


This will be a tricky one for the libkqueue people to fix, since 
libkqueue isn't currently geared up to process multiple events from 
inotify at once, which it would have to be prepared for if it were to 
start using a bigger buffer.  But it is a libkqueue problem to be 
fixed.  All that service-dt-scanner is doing is registering just one 
event of interest, and calling kevent() in a fairly tight loop that's in 
fact doing nothing else (apart from dumping the value of the spurious 
event).


Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT

2015-07-09 Thread Guillermo
2015-07-08 5:30 GMT-03:00 Jonathan de Boyne Pollard:

 If there's no error output, crank up strace and see what the last few system
 calls are.  It's probably worthwhile doing that anyway, in fact.

Here are the last system calls shown by an strace of
service-dt-scanner before the SIGABRT, with some relevant previous
lines. The specific test was using ls on the service's bundle
directory, which was a direct subdirectory of the scan directory, and
had daemontools-compatible layout.

At one point, this happens:

epoll_create(1) = 4
openat(AT_FDCWD, scandir,
O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
gettid()= 1328
fstat64(6, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0

Then there is a readlink() call on /proc/1328/fd/6 (scandir
happened to be a symbolic link) and then:

inotify_init()  = 7
inotify_add_watch(7, /path-to/test/scandir, IN_MODIFY|IN_ATTRIB|IN_CLOSE) = 1
epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=159460944, u64=159460944}}) = 0

/path-to/test/scandir represents here what was the complete path to
the scan directory.

And now the complete final lines:

fstatat64(11, down, 0xbfe6af30, 0)= -1 ENOENT (No such file or directory)
rt_sigaction(SIGALRM, {0x806e971, [], 0}, {SIG_DFL, [], 0}, 8) = 0
alarm(5)= 0
openat(12, ok, O_WRONLY|O_NOCTTY|O_CLOEXEC) = 13
close(13)   = 0
alarm(0)= 5
rt_sigaction(SIGALRM, {SIG_DFL, [], 0}, NULL, 8) = 0
openat(12, control, O_WRONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC) = 13
write(13, u, 1)   = 1
close(13)   = 0
close(12)   = 0
close(11)   = 0
close(10)   = 0
getdents(9, /* 0 entries */, 32768) = 0
close(9)= 0

(File descriptors were 9 = the scan directory, after a dup(6), 10 =
bundle directory of the service, 11 was the result of a dup(10), 12 =
supervise/ subdirectory of the service's bundle directory)

epoll_wait(4, {{EPOLLIN, {u32=159460944, u64=159460944}}}, 1, -1) = 1
gettid()= 1328
read(7, \1\0\0\0\20\0\0@\0\0\0\0\0\0\0\0, 16) = 16
write(2, service-dt-scanner: DEBUG: event..., 59) = 59

(That was an ls command on the scan directory, that produced this
output: service-dt-scanner: DEBUG: event filter 0 ident 6 fflags 6)

epoll_wait(4, {{EPOLLIN, {u32=159460944, u64=159460944}}}, 1, -1) = 1
gettid()= 1328
read(7, 0xbfe6b0cc, 16) = -1 EINVAL (Invalid argument)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(1328, 1328, SIGABRT) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1328, si_uid=1000} ---
+++ killed by SIGABRT +++

I did the strace a couple of different times, and all outputs ended
like this, a read() call on the file descriptor returned by the
inotify_init() that produces an EINVAL error, followed
rt_sigprocmask() with a SIG_UNBLOCK argument, and the tgkill() that
sends the SIGABRT.

Thanks,
G.