Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT
On 2015-11-05 01:29, Guillermo wrote: So problem solved. Thank you, this is a major improvement. Brill. Thank you for the confirmation.
Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT
2015-07-06 22:01 GMT-03:00 Guillermo: > > I wanted to try out nosh daemontools-style, i.e. using > service-dt-scanner and launching service-manager "by other means" > [...] > However, under certain circumstances service-dt-scanner receives a > SIGABRT signal and dies: 2015-07-12 9:45 GMT-03:00 Jonathan de Boyne Pollard: > > Remember that I said that my immediate suspicion is a (fourth) libkqueue > bug? It's a fourth libkqueue bug. > [...] > As the inotify(7) manual page says, if an event is larger than the buffer > size given to read(), it fails with EINVAL. And events can be larger than > sizeof(struct notify_event). libkqueue doesn't deal with this failure > properly, leading to a call to abort(): ... and using nosh version 1.22, the one that no longer depends on libkqueue, service-dt-scanner doesn't die anymore. Also, modifying the scan directory (e.g. placing a symbolic link to a bundle or service directory with daemontools layout, or directly copying it) correctly triggers a scan, which previously didn't work very well for me either. http://www.mail-archive.com/supervision%40list.skarnet.org/msg01043.html So problem solved. Thank you, this is a major improvement. G.
Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT
Guillermo: Have you ever considered just writing a BSD backend using kqueue / kevent, and a Linux backend using epoll / inotify or whatever native Linux mechanism is available to suit nosh's needs? I believe you mentioned there's quite a few conditional compilation already to work around libkqueue issues, and you also already have `uname` = FreeBSD and `uname` = Linux tests in the .do files. That's really the reverse of the direction that I'd like to go. I'd like to have less of this particular conditional compilation, not more. The aim is for the libkqueue bugs to be fixed, so that I can then take out the idiosyncratic Linux code.
Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT
Guillermo: Jonathan de Boyne Pollard: If there's no error output, crank up strace and see what the last few system calls are. It's probably worthwhile doing that anyway, in fact. [...] a read() call on the file descriptor returned by the inotify_init() that produces an EINVAL error, followed rt_sigprocmask() with a SIG_UNBLOCK argument, and the tgkill() that sends the SIGABRT. Remember that I said that my immediate suspicion is a (fourth) libkqueue bug? It's a fourth libkqueue bug. And it's here: * https://github.com/mheily/libkqueue/blob/master/src/linux/vnode.c#l70 As the inotify(7) manual page says, if an event is larger than the buffer size given to read(), it fails with EINVAL. And events can be larger than sizeof(struct notify_event). libkqueue doesn't deal with this failure properly, leading to a call to abort(): * https://github.com/mheily/libkqueue/blob/master/src/linux/platform.c#l181 nosh code never calls abort(), never calls raise(SIGABRT), and would have printed some kind of message if an unhandled exception had led to an abort being raised by the C++ library. The output that you are seeing from service-dt-scanner is because of a spurious wakeup. * https://github.com/mheily/libkqueue/blob/master/src/linux/platform.c#l199 You can turn these debug messages on with the KQUEUE_DEBUG=1 environment variable (and compiling the library in debug mode), apparently. * https://github.com/mheily/libkqueue/blob/master/src/common/kqueue.c#l68 libkqueue is receiving events from inotify that the caller of kevent() isn't actually interested in, resulting in a spurious wakeup from the call to kevent() with no actual event to report. The output to standard error is a minor bug in service-dt-scanner, because it assumes that every time that it is woken up and kevent() returns successfully there will be at least one event. It's finding nonsense in the event buffer and printing out a debug message when it ignores the nonsense. This is fixed in version 1.18, but this isn't really the cause of your problem here. It's just distracting log noise. The problem here is that inotify is waking kevent() up because you listed the directory. I suspect this change in your version of libkqueue, at first glance: * https://github.com/mheily/libkqueue/commit/e41cc259a0318b0e7925521d0fe3bc7433971ace After the spurious wakeup, there is another second event enqueued by the kernel, that is bigger than sizeof(struct notify_event). Whether that's an uninteresting event too, and whether it is also caused by your listing the directory, is unknown. libkqueue isn't passing a buffer big enough to read it so that we see what it is, and is abort()ing when the kernel returns an error because the read buffer is too small. This will be a tricky one for the libkqueue people to fix, since libkqueue isn't currently geared up to process multiple events from inotify at once, which it would have to be prepared for if it were to start using a bigger buffer. But it is a libkqueue problem to be fixed. All that service-dt-scanner is doing is registering just one event of interest, and calling kevent() in a fairly tight loop that's in fact doing nothing else (apart from dumping the value of the spurious event).
Re: nosh: service-dt-scanner gets repeatedly killed by SIGABRT
2015-07-08 5:30 GMT-03:00 Jonathan de Boyne Pollard: If there's no error output, crank up strace and see what the last few system calls are. It's probably worthwhile doing that anyway, in fact. Here are the last system calls shown by an strace of service-dt-scanner before the SIGABRT, with some relevant previous lines. The specific test was using ls on the service's bundle directory, which was a direct subdirectory of the scan directory, and had daemontools-compatible layout. At one point, this happens: epoll_create(1) = 4 openat(AT_FDCWD, scandir, O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6 gettid()= 1328 fstat64(6, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 Then there is a readlink() call on /proc/1328/fd/6 (scandir happened to be a symbolic link) and then: inotify_init() = 7 inotify_add_watch(7, /path-to/test/scandir, IN_MODIFY|IN_ATTRIB|IN_CLOSE) = 1 epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=159460944, u64=159460944}}) = 0 /path-to/test/scandir represents here what was the complete path to the scan directory. And now the complete final lines: fstatat64(11, down, 0xbfe6af30, 0)= -1 ENOENT (No such file or directory) rt_sigaction(SIGALRM, {0x806e971, [], 0}, {SIG_DFL, [], 0}, 8) = 0 alarm(5)= 0 openat(12, ok, O_WRONLY|O_NOCTTY|O_CLOEXEC) = 13 close(13) = 0 alarm(0)= 5 rt_sigaction(SIGALRM, {SIG_DFL, [], 0}, NULL, 8) = 0 openat(12, control, O_WRONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC) = 13 write(13, u, 1) = 1 close(13) = 0 close(12) = 0 close(11) = 0 close(10) = 0 getdents(9, /* 0 entries */, 32768) = 0 close(9)= 0 (File descriptors were 9 = the scan directory, after a dup(6), 10 = bundle directory of the service, 11 was the result of a dup(10), 12 = supervise/ subdirectory of the service's bundle directory) epoll_wait(4, {{EPOLLIN, {u32=159460944, u64=159460944}}}, 1, -1) = 1 gettid()= 1328 read(7, \1\0\0\0\20\0\0@\0\0\0\0\0\0\0\0, 16) = 16 write(2, service-dt-scanner: DEBUG: event..., 59) = 59 (That was an ls command on the scan directory, that produced this output: service-dt-scanner: DEBUG: event filter 0 ident 6 fflags 6) epoll_wait(4, {{EPOLLIN, {u32=159460944, u64=159460944}}}, 1, -1) = 1 gettid()= 1328 read(7, 0xbfe6b0cc, 16) = -1 EINVAL (Invalid argument) rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(1328, 1328, SIGABRT) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1328, si_uid=1000} --- +++ killed by SIGABRT +++ I did the strace a couple of different times, and all outputs ended like this, a read() call on the file descriptor returned by the inotify_init() that produces an EINVAL error, followed rt_sigprocmask() with a SIG_UNBLOCK argument, and the tgkill() that sends the SIGABRT. Thanks, G.