Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Fri, 16 Aug 2019, Uoti Urpala wrote: > On Thu, 2019-08-15 at 20:36 +1000, Michael Chapman wrote: > > With systemd 239 I was unable to cause an fd leak this way. > > > > Still, I would feel more comfortable if I could find a commit that > > definitely fixed the problem. All of these experiments are just > > circumstantial evidence. > > 5ae37ad833583e6c1c7765767b7f8360afca3b07 > sd-bus: when attached to an sd-event loop, disconnect on processing errors Ah, nice. So since systemd v237. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, 2019-08-15 at 20:36 +1000, Michael Chapman wrote: > With systemd 239 I was unable to cause an fd leak this way. > > Still, I would feel more comfortable if I could find a commit that > definitely fixed the problem. All of these experiments are just > circumstantial evidence. 5ae37ad833583e6c1c7765767b7f8360afca3b07 sd-bus: when attached to an sd-event loop, disconnect on processing errors ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, 15 Aug 2019, Lennart Poettering wrote: > On Mi, 14.08.19 22:36, Michael Chapman (m...@very.puzzling.org) wrote: > > > On Wed, 14 Aug 2019, Lennart Poettering wrote: > > > Well, a D-Bus connection can remain open indefinitely, and may even > > > have incomplete "half" messages queued in them as long as the client > > > desires. After the initial authentication is done, clients may thus > > > take up resources as long as they want, this is by design of dbus > > > really, and is different from HTTP for example, where connections > > > usually have a time-out applied. dbus doesn't know timeouts for > > > established connections. It knows them for the authentication phase, > > > and it knows them for method calls that are flight, but it does not > > > know them for the mere existance of an established connection. > > > > I'm sure it's not in the design of DBus that clients can continue to > > consume those resources after they've disconnected. > > > > > PID 1 authenticates clients of the private connection simply by making > > > the socket for it inaccessible to anyone who is not privileged. Due to > > > that it gets away with not doing any further per-user accounting, > > > because it knows the clients are all privileged anyway. > > > > > > So, yes, it would be good if we could protect us from any form of > > > misuse, but basically, if you have a root client that misbehaves you > > > have too accept that... > > > > I understand all that. Nevertheless, Brian identified a bug: after > > receiving certain data on its private socket, the systemd process can leak > > a file descriptor. > > Can it? Did I miss something? If the client closes the client side of > the socket, but PID 1 would keep the server side of it open anyway, > then that would be a bug indeed. But my understanding was that the > client side stays pinned? I was able to reproduce the bug on CentOS 7's systemd 219. That is, the file descriptor in PID 1 was dropped from its epoll set without it reaching EOF and without it being closed. Every time I ran Brian's command PID 1 would leak another file descriptor. I was unable to reproduce this on a later version of systemd, but that _could_ just be because this later version of systemd side-steps the issue by ensuring that systemctl doesn't use fd 1 for its socket. I have some reason to believe the problem in PID 1 has been fixed though. On CentOS 7 I was able to cause it to sometimes leak an fd simply by sending random data to it: # count-sockets() { ss -x | grep /run/systemd/private | wc -l; } # inject-junk() { timeout 1s nc -U /run/systemd/private https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Mi, 14.08.19 22:36, Michael Chapman (m...@very.puzzling.org) wrote: > On Wed, 14 Aug 2019, Lennart Poettering wrote: > > Well, a D-Bus connection can remain open indefinitely, and may even > > have incomplete "half" messages queued in them as long as the client > > desires. After the initial authentication is done, clients may thus > > take up resources as long as they want, this is by design of dbus > > really, and is different from HTTP for example, where connections > > usually have a time-out applied. dbus doesn't know timeouts for > > established connections. It knows them for the authentication phase, > > and it knows them for method calls that are flight, but it does not > > know them for the mere existance of an established connection. > > I'm sure it's not in the design of DBus that clients can continue to > consume those resources after they've disconnected. > > > PID 1 authenticates clients of the private connection simply by making > > the socket for it inaccessible to anyone who is not privileged. Due to > > that it gets away with not doing any further per-user accounting, > > because it knows the clients are all privileged anyway. > > > > So, yes, it would be good if we could protect us from any form of > > misuse, but basically, if you have a root client that misbehaves you > > have too accept that... > > I understand all that. Nevertheless, Brian identified a bug: after > receiving certain data on its private socket, the systemd process can leak > a file descriptor. Can it? Did I miss something? If the client closes the client side of the socket, but PID 1 would keep the server side of it open anyway, then that would be a bug indeed. But my understanding was that the client side stays pinned? Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, 14 Aug 2019 12:04:03 -0400 Brian Reichert wrote: > On Wed, Aug 14, 2019 at 04:19:46PM +0100, Simon McVittie wrote: > > On Wed, 14 Aug 2019 at 10:26:53 -0400, Brian Reichert wrote: > > Doesn't daemonize(1) make stdin, stdout and stderr point > > to /dev/null, instead of closing them? > > Looking at the source, yes, it does. > > > Expecting arbitrary subprocesses to cope gracefully with being > > invoked without the three standard fds seems likely to be a losing > > battle. I've implemented this myself, in dbus; it isn't a whole lot > > of code, but it also isn't something that I would expect the > > authors of all CLI tools to get right. > > I concede that reopening FD 0,1,2 is a good practice to insulate > against the issues you cite. > > I agree with your points; I code aggressively, and sometimes forget > others don't. I didn't know what you meant by this. Do you mean 'Aggressive Programming'? Is https://www.apharmony.com/software-sagacity/2014/10/principles-of-aggressive-programming/ a reasonable summary? > > smcv > > > > [1] I'm sure there are lots of other executables named daemon or > > daemonize in other OSs, and perhaps some of them get this wrong? > > -- > Brian Reichert > BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Aug 14, 2019 at 04:19:46PM +0100, Simon McVittie wrote: > On Wed, 14 Aug 2019 at 10:26:53 -0400, Brian Reichert wrote: > Doesn't daemonize(1) make stdin, stdout and stderr point to /dev/null, > instead of closing them? Looking at the source, yes, it does. > Expecting arbitrary subprocesses to cope gracefully with being invoked > without the three standard fds seems likely to be a losing battle. > I've implemented this myself, in dbus; it isn't a whole lot of code, > but it also isn't something that I would expect the authors of all CLI > tools to get right. I concede that reopening FD 0,1,2 is a good practice to insulate against the issues you cite. I agree with your points; I code aggressively, and sometimes forget others don't. > smcv > > [1] I'm sure there are lots of other executables named daemon or daemonize > in other OSs, and perhaps some of them get this wrong? > ___ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, 14 Aug 2019 at 10:26:53 -0400, Brian Reichert wrote: > And, if I were to ever use daemonize(1), or any other other canonical > mechanism for daemonizing code, STDOUT would normally be closed > under those circumstances, as well. Doesn't daemonize(1) make stdin, stdout and stderr point to /dev/null, instead of closing them? That's what glibc daemon(3) does, and seems a lot better as a way to not break subprocesses' expectations. The man pages of the daemon(1) and daemonize(1) in Debian[1] also both say they do this, and it's what is recommended in systemd's own daemon(7). Expecting arbitrary subprocesses to cope gracefully with being invoked without the three standard fds seems likely to be a losing battle. I've implemented this myself, in dbus; it isn't a whole lot of code, but it also isn't something that I would expect the authors of all CLI tools to get right. dbus only does this because X11 autolaunching can lead to dbus-daemon being invoked in a broken execution environment, and I would have considered the equivalent bug in most tools to be WONTFIX. Expecting arbitrary *library code* to cope gracefully with being invoked without the three standard fds seems like a non-starter, because these fds are process-global state, so library code cannot safely mess with them unless specifically asked to do so by the calling process. In general library code can't do this safely anyway, because as soon as the process has a second thread, it becomes impossible to fix the fds in one thread without risking breaking another thread - this is one of the operations that is usually only safe to do right at the beginning of main(), similar to setenv(). smcv [1] I'm sure there are lots of other executables named daemon or daemonize in other OSs, and perhaps some of them get this wrong? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Mi, 14.08.19 10:26, Brian Reichert (reich...@numachi.com) wrote: > On Wed, Aug 14, 2019 at 11:34:21AM +0200, Lennart Poettering wrote: > > Hence: your code that closes fd1 like this is simply buggy. Don't do > > that, you are shooting yourself in the foot. > > Buggy or no, this is fifteen-year-old code, and prior cron/service > mgmt framework implementations had no issue. > > And, if I were to ever use daemonize(1), or any other other canonical > mechanism for daemonizing code, STDOUT would normally be closed > under those circumstances, as well. In daemons, stdin/stdout/stderr are generally closed and then opened as /dev/null again, precisely to avoid problems like that, so that fd0,1,2 are taken and any future fd allocations are > 2. This is what BSD/glibc daemon(3) does for example, and most daemons do it like that. printf() and friends assume that fd1 is stdout. fgets() and friends assume that fd0 is stdin. error() and friends assume that fd2 is stderr. Most command line programs (such as ls, echo or systemctl assume that fd0, 1, 2 are defined that way, and they will do their output and input according to that. if you close these fds before invoking these programs they will do borked stuff too... Example: telnet www.microsoft.de 80 2>&1- See, this will connect to port 80, and do so with a socket which is allocated as fd1, because we called this tool with fd1 closed. Thus it interferes with stdout, and that one line string that says "Trying …\nConnected to …\r\nEscape character is '^]'" will be written to the peer instead of the local tty! That works with any tool. If you invoke a tool with stdout/stderr/stdin not connected weird shit happens as soon as the app allocates the first fd. systemctl is just one of many. Note that the robustness we added to sd-bus which explicitly moves the fd >= 3 is not supposed to make things work in cases where systemctl is called with closed stdin, stdout or stderr. Instead, we made this change since the sd-bus code is used in NSS clients, i.e. as backend for gethostbyname(), and thus we really don't know in which kind of contexts it is used. i.e. it might very well be called inside a process that is currently rearranging stdin/stdout/stderr and thus might have closed them temporarily. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Aug 14, 2019 at 11:34:21AM +0200, Lennart Poettering wrote: > Hence: your code that closes fd1 like this is simply buggy. Don't do > that, you are shooting yourself in the foot. Buggy or no, this is fifteen-year-old code, and prior cron/service mgmt framework implementations had no issue. And, if I were to ever use daemonize(1), or any other other canonical mechanism for daemonizing code, STDOUT would normally be closed under those circumstances, as well. I'm wading into hypotheticals here, but daemonized code should, in turn, be able to invoke whatever code is subsequently wants. > Lennart > > -- > Lennart Poettering, Berlin -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, 14 Aug 2019, Lennart Poettering wrote: > Well, a D-Bus connection can remain open indefinitely, and may even > have incomplete "half" messages queued in them as long as the client > desires. After the initial authentication is done, clients may thus > take up resources as long as they want, this is by design of dbus > really, and is different from HTTP for example, where connections > usually have a time-out applied. dbus doesn't know timeouts for > established connections. It knows them for the authentication phase, > and it knows them for method calls that are flight, but it does not > know them for the mere existance of an established connection. I'm sure it's not in the design of DBus that clients can continue to consume those resources after they've disconnected. > PID 1 authenticates clients of the private connection simply by making > the socket for it inaccessible to anyone who is not privileged. Due to > that it gets away with not doing any further per-user accounting, > because it knows the clients are all privileged anyway. > > So, yes, it would be good if we could protect us from any form of > misuse, but basically, if you have a root client that misbehaves you > have too accept that... I understand all that. Nevertheless, Brian identified a bug: after receiving certain data on its private socket, the systemd process can leak a file descriptor. All we need to establish is whether that bug still exists in the current version of systemd. _Even if_ it isn't a security issue, having one fewer bugs is a good thing. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Mi, 14.08.19 19:47, Michael Chapman (m...@very.puzzling.org) wrote: > On Wed, 14 Aug 2019, Lennart Poettering wrote: > > Quite frankly, invoking generic UNIX programs with fds < 3 closed is a > > really bad idea in general. That systemctl nowadays is particularly > > careful and deals with situations like that is not an invitation to > > actually invoke things like this. After all UNIX guarantees that > > open() (and the other calls that allocate an fd) allocated the lowest > > available one, hence they will necessarily step into > > stdin/stdout/stderr territory when invoked with any of those fds > > closed. > > > > Hence: your code that closes fd1 like this is simply buggy. Don't do > > that, you are shooting yourself in the foot. And even if newer > > systemctl will handle cases like this more gracefully pretty much any > > other tool you might call will break in some form or another too, > > since a simple printf() will already spill into the wrong fds in that > > case. > > That's all true, but the thing we need to check here is that systemd > correctly handles junk on the /run/systemd/private socket. The change on > the systemctl side certainly tries to prevent incorrect data being sent > down the socket -- though it looks like there's several ways in which > fd_move_above_stdio() can fail, so this isn't foolproof -- but we need to > ensure that some _malicious_ client can't DoS systemd. Well, a D-Bus connection can remain open indefinitely, and may even have incomplete "half" messages queued in them as long as the client desires. After the initial authentication is done, clients may thus take up resources as long as they want, this is by design of dbus really, and is different from HTTP for example, where connections usually have a time-out applied. dbus doesn't know timeouts for established connections. It knows them for the authentication phase, and it knows them for method calls that are flight, but it does not know them for the mere existance of an established connection. PID 1 authenticates clients of the private connection simply by making the socket for it inaccessible to anyone who is not privileged. Due to that it gets away with not doing any further per-user accounting, because it knows the clients are all privileged anyway. So, yes, it would be good if we could protect us from any form of misuse, but basically, if you have a root client that misbehaves you have too accept that... Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, 14 Aug 2019, Lennart Poettering wrote: > Quite frankly, invoking generic UNIX programs with fds < 3 closed is a > really bad idea in general. That systemctl nowadays is particularly > careful and deals with situations like that is not an invitation to > actually invoke things like this. After all UNIX guarantees that > open() (and the other calls that allocate an fd) allocated the lowest > available one, hence they will necessarily step into > stdin/stdout/stderr territory when invoked with any of those fds > closed. > > Hence: your code that closes fd1 like this is simply buggy. Don't do > that, you are shooting yourself in the foot. And even if newer > systemctl will handle cases like this more gracefully pretty much any > other tool you might call will break in some form or another too, > since a simple printf() will already spill into the wrong fds in that > case. > > Lennart That's all true, but the thing we need to check here is that systemd correctly handles junk on the /run/systemd/private socket. The change on the systemctl side certainly tries to prevent incorrect data being sent down the socket -- though it looks like there's several ways in which fd_move_above_stdio() can fail, so this isn't foolproof -- but we need to ensure that some _malicious_ client can't DoS systemd. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Mi, 14.08.19 18:53, Michael Chapman (m...@very.puzzling.org) wrote: > On Wed, 14 Aug 2019, Michael Chapman wrote: > > On Wed, 14 Aug 2019, Brian Reichert wrote: > > > On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek > > > wrote: > > > > Yes. (With the caveat that there *are* legitimate reasons to have new > > > > long-lived fds created, so not every long-lived fd is "wrong".) > > > > > > I finally was able to track down what's happening on my system. > > > > > > This is sufficient to reproduce the effect of increasing the number > > > of file descriptors open to /run/systemd/private; at least, on my > > > box, in it's current state: > > > > > > sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service' > > > > I can reproduce this on CentOS 7's systemd 219, but not on Fedora 29's > > systemd 239. > > Looking in to this further, my guess is that this is fixed in: > > commit 7fe2903c2315db9d9f501ae255a6b6e4f01ba46c > Author: Lennart Poettering > Date: Fri Feb 9 17:53:28 2018 +0100 > > fd-util: move certain fds above fd #2 (#8129) > > which was introduced in systemd 238. > > What's happening is that systemctl is getting fd 1 for the socket to > /run/systemd/private, as it's the lowest unused fd, but fd 1 (aka stdout) > is also where it sends its regular human-readable output. This commit > ensures that the socket is moved to a file descriptor that can't be > mistaken for a standard stream. > > I'm not sure if systemd >= 238 itself still suffers from the problem where > any "extra" data on that connection causes it to leak its file descriptor. > That may still be a problem, or it may have been fixed by a different > commit. Quite frankly, invoking generic UNIX programs with fds < 3 closed is a really bad idea in general. That systemctl nowadays is particularly careful and deals with situations like that is not an invitation to actually invoke things like this. After all UNIX guarantees that open() (and the other calls that allocate an fd) allocated the lowest available one, hence they will necessarily step into stdin/stdout/stderr territory when invoked with any of those fds closed. Hence: your code that closes fd1 like this is simply buggy. Don't do that, you are shooting yourself in the foot. And even if newer systemctl will handle cases like this more gracefully pretty much any other tool you might call will break in some form or another too, since a simple printf() will already spill into the wrong fds in that case. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, 14 Aug 2019, Michael Chapman wrote: > On Wed, 14 Aug 2019, Brian Reichert wrote: > > On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek > > wrote: > > > Yes. (With the caveat that there *are* legitimate reasons to have new > > > long-lived fds created, so not every long-lived fd is "wrong".) > > > > I finally was able to track down what's happening on my system. > > > > This is sufficient to reproduce the effect of increasing the number > > of file descriptors open to /run/systemd/private; at least, on my > > box, in it's current state: > > > > sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service' > > I can reproduce this on CentOS 7's systemd 219, but not on Fedora 29's > systemd 239. Looking in to this further, my guess is that this is fixed in: commit 7fe2903c2315db9d9f501ae255a6b6e4f01ba46c Author: Lennart Poettering Date: Fri Feb 9 17:53:28 2018 +0100 fd-util: move certain fds above fd #2 (#8129) which was introduced in systemd 238. What's happening is that systemctl is getting fd 1 for the socket to /run/systemd/private, as it's the lowest unused fd, but fd 1 (aka stdout) is also where it sends its regular human-readable output. This commit ensures that the socket is moved to a file descriptor that can't be mistaken for a standard stream. I'm not sure if systemd >= 238 itself still suffers from the problem where any "extra" data on that connection causes it to leak its file descriptor. That may still be a problem, or it may have been fixed by a different commit. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, 14 Aug 2019, Brian Reichert wrote: > On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek wrote: > > Yes. (With the caveat that there *are* legitimate reasons to have new > > long-lived fds created, so not every long-lived fd is "wrong".) > > I finally was able to track down what's happening on my system. > > This is sufficient to reproduce the effect of increasing the number > of file descriptors open to /run/systemd/private; at least, on my > box, in it's current state: > > sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service' I can reproduce this on CentOS 7's systemd 219, but not on Fedora 29's systemd 239. On CentOS 7 I took two `strace -e desc -p 1` runs, comparing: # good sh -c 'exec 1>/dev/null; systemctl status tmp.mount' with: # bad sh -c 'exec 1>&-; systemctl status tmp.mount' The diff is: # diff -u /tmp/good /tmp/bad --- /tmp/good 2019-08-14 18:11:20.100792406 +1000 +++ /tmp/bad 2019-08-14 18:11:24.882886972 +1000 @@ -6,7 +6,7 @@ fstat(24, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 epoll_ctl(4, EPOLL_CTL_ADD, 24, {0, {u32=107062384, u64=94382013391984}}) = 0 epoll_ctl(4, EPOLL_CTL_MOD, 24, {EPOLLIN|EPOLLOUT, {u32=107062384, u64=94382013391984}}) = 0 -timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={1136546, 444853000}}, NULL) = 0 +timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={1136551, 444853000}}, NULL) = 0 epoll_wait(4, [{EPOLLOUT, {u32=107062384, u64=94382013391984}}], 58, -1) = 1 epoll_ctl(4, EPOLL_CTL_MOD, 24, {EPOLLIN, {u32=107062384, u64=94382013391984}}) = 0 epoll_wait(4, [{EPOLLIN, {u32=107062384, u64=94382013391984}}], 58, -1) = 1 @@ -23,13 +23,7 @@ openat(AT_FDCWD, "/usr/local/lib/systemd/system/tmp.mount.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/systemd/system/tmp.mount.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/run/systemd/generator.late/tmp.mount.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory) -epoll_wait(4, [{EPOLLIN|EPOLLHUP, {u32=107062384, u64=94382013391984}}], 58, -1) = 1 -epoll_ctl(4, EPOLL_CTL_MOD, 24, {0, {u32=107062384, u64=94382013391984}}) = 0 -timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 1}}, NULL) = 0 -epoll_wait(4, [{EPOLLHUP, {u32=107062384, u64=94382013391984}}, {EPOLLIN, {u32=3, u64=3}}], 58, -1) = 2 -read(3, "\1\0\0\0\0\0\0\0", 8) = 8 +epoll_wait(4, [{EPOLLIN, {u32=107062384, u64=94382013391984}}], 58, -1) = 1 epoll_ctl(4, EPOLL_CTL_DEL, 24, NULL) = 0 -close(24) = 0 -timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={1136622, 194853000}}, NULL) = 0 epoll_wait(4, strace: Process 1 detached So it looks like systemd is removing the file descriptor from the epoll instance in both cases. However, in the "bad" case no EOF was reached, and the file descriptor is never closed. Looking at `ss` there is still 387 bytes in this descriptor's receive queue. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek wrote: > Yes. (With the caveat that there *are* legitimate reasons to have new > long-lived fds created, so not every long-lived fd is "wrong".) I finally was able to track down what's happening on my system. This is sufficient to reproduce the effect of increasing the number of file descriptors open to /run/systemd/private; at least, on my box, in it's current state: sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service' We have cronjob that closes STDOUT, remaps STDERR to a log file, and runs this systemctl command. In my environment, this one-liner will cause that FD count to go up by, 100% reproducible. Somehow, closing STDOUT is necessary to see this. FWIW, the strace effort didn't yeild anything; instead, I configured auditd to reveal when systemctl was invoked, and found a pattern of invocations I was able to backtrack to the cronjob. > Zbyszek -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, Aug 01, 2019 at 08:46:58AM -0400, Brian Reichert wrote: > On Thu, Aug 01, 2019 at 08:17:01AM +, Zbigniew J??drzejewski-Szmek wrote: > > The kernel will use the lower-numbered available fd, so there's lot of > > "reuse" of the same numbers happening. This strace means that between > > each of those close()s here, some other function call returned fd 19. > > Until we know what those calls are, we cannot say why fd19 remains > > open. (In fact, the only thing we can say for sure, is that the > > accept4() call shown above is not relevant.) > > So, what I propose at this step: > > - Restart my strace, this time using '-e trace=desc' (Trace all > file descriptor related system calls.) > > - Choose to focus on a single descriptor; when I passively notice > that '19' has been reused a couple of time, stop the trace. Well, no. If you notice that '19' has STOPPED being reused, then stop the trace. If it is being reused, then it's it's not "leaked". Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, Aug 01, 2019 at 08:17:01AM +, Zbigniew J??drzejewski-Szmek wrote: > The kernel will use the lower-numbered available fd, so there's lot of > "reuse" of the same numbers happening. This strace means that between > each of those close()s here, some other function call returned fd 19. > Until we know what those calls are, we cannot say why fd19 remains > open. (In fact, the only thing we can say for sure, is that the > accept4() call shown above is not relevant.) So, what I propose at this step: - Restart my strace, this time using '-e trace=desc' (Trace all file descriptor related system calls.) - Choose to focus on a single descriptor; when I passively notice that '19' has been reused a couple of time, stop the trace. That should give me a smaller trace to analyze. > Zbyszek -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Jul 31, 2019 at 11:37:31AM -0400, Brian Reichert wrote: > On Wed, Jul 31, 2019 at 12:36:41AM +0300, Uoti Urpala wrote: > > On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote: > > > I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls > > > for close(), e.g.: > > > > > > 13:50:01 close(19) = 0 > > > > > > Followed by getsockopt(), and a received message on the supposedly-closed > > > file descriptor: > > > > > > 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, > > > gid=0}, [12]) = 0 > > > > Are you sure it's the same file descriptor? You don't explicitly say > > anything about there not being any relevant lines between those. Does > > systemd really just call getsockopt() on fd 19 after closing it, with > > nothing to trigger that? Obvious candidates to check in the strace > > would be an accept call returning a new fd 19, or epoll indicating > > activity on the fd (though I'd expect systemd to remove the fd from the > > epoll set after closing it). > > My analysis is naive. > > There was an earlier suggestion to use strace, limiting it to a > limited number of system calls. > > I then used a simple RE to look for the string '(19', to see calls where > '19' was used as an initial argument to system calls. That's way too > simplistic. > > To address some of your questions/points. > > - No, I don't know if it's the same file descriptor. I could not > start strace early enough to catch the creation of several dozen > file descriptors. This shouldn't really matter. We care about descriptors which are created while the process is running, so it is not a problem if we "miss" some that are created early. > 13:50:01 accept4(13, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 19 > 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, > [12]) = 0 > 13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0 > 13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0 > 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, > [12]) = 0 > 13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 13:50:01 > getsockname(19, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) > = 0 13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL > 30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, > msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45 > 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK > 9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, > {NULL, 0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 > 13:50:01 sendmsg(19, {msg_name(0)=NULL, > msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 > > \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", > 128}, > {"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0", > 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE > (Broken pipe) > 13:50:01 close(19) = 0 > 13:50:01 close(19) = 0 > 13:50:01 close(19) = 0 > 13:50:01 close(19) = 0 > 13:50:01 close(19) = 0 > 13:50:01 close(19) = 0 > 13:50:01 close(19) = 0 The kernel will use the lower-numbered available fd, so there's lot of "reuse" of the same numbers happening. This strace means that between each of those close()s here, some other function call returned fd 19. Until we know what those calls are, we cannot say why fd19 remains open. (In fact, the only thing we can say for sure, is that the accept4() call shown above is not relevant.) Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Jul 31, 2019 at 12:36:41AM +0300, Uoti Urpala wrote: > On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote: > > I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls > > for close(), e.g.: > > > > 13:50:01 close(19) = 0 > > > > Followed by getsockopt(), and a received message on the supposedly-closed > > file descriptor: > > > > 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, > > gid=0}, [12]) = 0 > > Are you sure it's the same file descriptor? You don't explicitly say > anything about there not being any relevant lines between those. Does > systemd really just call getsockopt() on fd 19 after closing it, with > nothing to trigger that? Obvious candidates to check in the strace > would be an accept call returning a new fd 19, or epoll indicating > activity on the fd (though I'd expect systemd to remove the fd from the > epoll set after closing it). My analysis is naive. There was an earlier suggestion to use strace, limiting it to a limited number of system calls. I then used a simple RE to look for the string '(19', to see calls where '19' was used as an initial argument to system calls. That's way too simplistic. To address some of your questions/points. - No, I don't know if it's the same file descriptor. I could not start strace early enough to catch the creation of several dozen file descriptors. - I didn't say anything about lines between those that I cited, as I could not ascertain relevance. - And I completely missed the case of the accept4() calls returning the value of 19, among other cases where '19' shows up as a value. A regex-based search is certainly inconclusive, but now I'm using this: egrep -e '[^0-9:]19(\)|\}|\]|,)?' /home/systemd.strace.trimmed | less The rhythm now seems to be more like this: 13:50:01 accept4(13, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 19 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, [12]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, [12]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 13:50:01 getsockname(19, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) = 0 13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", 128}, {"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0", 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE (Broken pipe) 13:50:01 close(19) = 0 13:50:01 close(19) = 0 13:50:01 close(19) = 0 13:50:01 close(19) = 0 13:50:01 close(19) = 0 13:50:01 close(19) = 0 13:50:01 close(19) = 0 ... Mind you, I see a _lot_ more close() calls than accepts(): localhost:~ # egrep -e '[^0-9:]19(\)|\}|\]|,)?' /home/systemd.strace.trimmed > /home/systemd.strace.trimmed.19 localhost:~ # grep accept4\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 2- | sort -u accept4(13, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 19 localhost:~ # grep accept4\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 2- | wc -l 55 localhost:~ # grep close\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 2- | sort -u close(19) = 0 localhost:~ # grep close\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 2- | wc -l 1051 I'm not asserting the frequencies are indicative of anything wrong; I'm just more used to a 1:1 correlation. And, again, the age of the file in /proc/1/fd/19 never seems to change: localhost:~ # ls -ld --full-time /proc/1/fd/19 lrwx-- 1 root root 64 2019-07-30 15:45:25.531468318 -0400 /proc/1/fd/19 -> socket:[27085] localhost:~ # date Wed Jul 31 11:31:37 EDT 2019 That may be a red herring. I have been assuming that if an FD was closed, then reopened/recreated by a process, that file would have a new age. > > > ___ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Brian Reichert BSD admin/developer
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote: > I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls > for close(), e.g.: > > 13:50:01 close(19) = 0 > > Followed by getsockopt(), and a received message on the supposedly-closed > file descriptor: > > 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, > [12]) = 0 Are you sure it's the same file descriptor? You don't explicitly say anything about there not being any relevant lines between those. Does systemd really just call getsockopt() on fd 19 after closing it, with nothing to trigger that? Obvious candidates to check in the strace would be an accept call returning a new fd 19, or epoll indicating activity on the fd (though I'd expect systemd to remove the fd from the epoll set after closing it). ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, Jul 11, 2019 at 08:35:38PM +, Zbigniew J??drzejewski-Szmek wrote: > On Thu, Jul 11, 2019 at 10:08:43AM -0400, Brian Reichert wrote: > > Does that sound like expected behavior? > > No, this shouldn't happen. > > What I was trying to say, is that if you have the strace log, you > can figure out what created the stale connection and what the dbus > call was, and from all that info it should be fairly simply to figure > out what the calling command was. Once you have that, it'll be much > easier to reproduce the issue in controlled setting and look for the > fix. I'm finally revisiting this. I haven't found a way to get a trace to start early enough to catch the initial open() on all of the targeted file descriptors, but I'm trying to make do with what I have. To sum up, in my naive analysis, I see close() called many times on a file descriptor. I then see more messages come in on that same descriptor. But the timestamp of the descriptor in /proc never changes. I created a service to launch strace as early as I can figure: localhost:~ # cat /usr/lib/systemd/system/systemd_strace.service [Unit] Description=strace systemd DefaultDependencies=no After=local-fs.target Before=sysinit.target ConditionPathExists=!/etc/initrd-release [Service] ExecStart=/usr/bin/strace -p1 -t -o /home/systemd.strace -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999 ExecStop=/bin/echo systemd_strace.service will soon exit Type=simple [Install] WantedBy=multi-user.target I introduced the '-t' flag, so I'd get timestamps on the recorded entries. I rebooted the server, and after ~20 minutes, I found stale descriptors, that seem to date to when the host first booted. Note the age of them, relative to the boot time, and they have no connected peers. localhost:~ # uptime 14:10pm up 0:21, 3 users, load average: 0.81, 0.24, 0.15 localhost:~ # date Tue Jul 30 14:10:09 EDT 2019 localhost:~ # lsof -nP /run/systemd/private | awk '/systemd/ { sub(/u/, "", $4); print $4}' | ( cd /proc/1/fd; xargs ls -t --full-time ) | tail -5 lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 14 -> socket:[28742] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 16 -> socket:[35430] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 17 -> socket:[37758] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 18 -> socket:[41044] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 19 -> socket:[43411] localhost:~ # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l 0 This is an XFS filesystem, so I can't directly get the creation time of my trace file, but I can see the first entry is timestamped '13:49:07'. I copied the trace file aside, and edited that copy to trim everything off after 14:10:09, when I ran that 'date' command above. As early as I tried to start this trace, dozens of file descriptors had already been created. Trying to focus on FD 19 (the oldest connection to /run/systemd/private): I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls for close(), e.g.: 13:50:01 close(19) = 0 Followed by getsockopt(), and a received message on the supposedly-closed file descriptor: 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, [12]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, [12]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 13:50:01 getsockname(19, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) = 0 13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0\0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", 128}, {"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0", 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE (Broken pipe) I see a continuous stream of messages coming in on FD 19, though the end of the trace, but the age of the file descriptor in /proc never seems to change. Am I misinterpreting something? > Zbyszek -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, Jul 11, 2019 at 10:08:43AM -0400, Brian Reichert wrote: > On Wed, Jul 10, 2019 at 10:44:14PM +, Zbigniew J??drzejewski-Szmek wrote: > > That's ancient... 228 was released almost four years ago. > > That's the joy of using a commercial Linux distribution; they tend > to be conservative about updates. SLES may very well have backported > fixes to the packaged version they maintain. > > They may also have a newer version of a systemd RPM for us to take. > > I'm looking for an efficient way to repro the symptoms, as to confirm > whether a newer RPM solves this for us. > > > > > > When we first spin up a new SLES12 host with our custom services, > > > > > the number of connections to /run/systemd/private numbers in the > > > > > mere hundreds. > > > > > > > That sounds wrong already. Please figure out what those connections > > > > are. I'm afraid that you might have to do some debugging on your > > > > own, since this issue doesn't seem easily reproducible. > > Above, I cite a desire for reproducing the symptoms. If you're > confident that a newly-spun-up idle host should not hover at hundreds > of connections, then hypothetically I could update the vendor-provided > systemd RPM (if there is one), reboot, and see if the connection > count is reduced. > > > strace -p1 -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999 > > > > yields the relevant info. In particular, the pid, uid, and guid of the > > remote is shown. My approach would be to log this to some file, and > > then see which fds remain, and then look up this fd in the log. > > The recvmsg calls contain the serialized dbus calls, a bit messy but > > understandable. E.g. 'systemctl show systemd-udevd' gives something > > like this: > > Thanks for such succinct feedback; I'll see what I can get from this. > > In my prior email, I showed how some of the connections were > hours/days old, even with no connecting peer. > > Does that sound like expected behavior? No, this shouldn't happen. What I was trying to say, is that if you have the strace log, you can figure out what created the stale connection and what the dbus call was, and from all that info it should be fairly simply to figure out what the calling command was. Once you have that, it'll be much easier to reproduce the issue in controlled setting and look for the fix. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Jul 10, 2019 at 10:44:14PM +, Zbigniew J??drzejewski-Szmek wrote: > That's ancient... 228 was released almost four years ago. That's the joy of using a commercial Linux distribution; they tend to be conservative about updates. SLES may very well have backported fixes to the packaged version they maintain. They may also have a newer version of a systemd RPM for us to take. I'm looking for an efficient way to repro the symptoms, as to confirm whether a newer RPM solves this for us. > > > > When we first spin up a new SLES12 host with our custom services, > > > > the number of connections to /run/systemd/private numbers in the > > > > mere hundreds. > > > > > That sounds wrong already. Please figure out what those connections > > > are. I'm afraid that you might have to do some debugging on your > > > own, since this issue doesn't seem easily reproducible. Above, I cite a desire for reproducing the symptoms. If you're confident that a newly-spun-up idle host should not hover at hundreds of connections, then hypothetically I could update the vendor-provided systemd RPM (if there is one), reboot, and see if the connection count is reduced. > strace -p1 -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999 > > yields the relevant info. In particular, the pid, uid, and guid of the > remote is shown. My approach would be to log this to some file, and > then see which fds remain, and then look up this fd in the log. > The recvmsg calls contain the serialized dbus calls, a bit messy but > understandable. E.g. 'systemctl show systemd-udevd' gives something > like this: Thanks for such succinct feedback; I'll see what I can get from this. In my prior email, I showed how some of the connections were hours/days old, even with no connecting peer. Does that sound like expected behavior? > HTH, > Zbyszek -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Jul 10, 2019 at 09:51:36AM -0400, Brian Reichert wrote: > On Wed, Jul 10, 2019 at 07:37:19AM +, Zbigniew J??drzejewski-Szmek wrote: > > > It's a bug report as any other. Writing a meaningful reply takes time > > and effort. Lack of time is a much better explanation than ressentiments. > > I wasn't expressing resentment; I apologize if it came off that way. > > > Please always specify the systemd version in use. We're not all SLES > > users, and even if we were, I assume that there might be different > > package versions over time. > > Quite reasonable: > > localhost:/var/tmp # cat /etc/os-release > NAME="SLES" > VERSION="12-SP3" > VERSION_ID="12.3" > PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3" > ID="sles" > ANSI_COLOR="0;32" > CPE_NAME="cpe:/o:suse:sles:12:sp3" > > localhost:/var/tmp # rpm -q systemd > systemd-228-142.1.x86_64 That's ancient... 228 was released almost four years ago. > > > When we first spin up a new SLES12 host with our custom services, > > > the number of connections to /run/systemd/private numbers in the > > > mere hundreds. > > > That sounds wrong already. Please figure out what those connections > > are. I'm afraid that you might have to do some debugging on your > > own, since this issue doesn't seem easily reproducible. > > What tactics should I employ? All of those file handles to > /run/systemd/private are owned by PID 1, and 'ss' implies there are > no peers. > > 'strace' in pid shows messages are flowing, but that doesn't reveal > the logic about how the connections get created or culled, nor who > initiated them. strace -p1 -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999 yields the relevant info. In particular, the pid, uid, and guid of the remote is shown. My approach would be to log this to some file, and then see which fds remain, and then look up this fd in the log. The recvmsg calls contain the serialized dbus calls, a bit messy but understandable. E.g. 'systemctl show systemd-udevd' gives something like this: recvmsg(20, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\4\1\5\0\0\0\1\0\0\0\257\0\0\0\1\1o\08\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24 recvmsg(20, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/systemd_2dudevd_2eservice\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0GetAll\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0\0\0\0\0\0", ... HTH, Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Wed, Jul 10, 2019 at 07:37:19AM +, Zbigniew J??drzejewski-Szmek wrote: > It's a bug report as any other. Writing a meaningful reply takes time > and effort. Lack of time is a much better explanation than ressentiments. I wasn't expressing resentment; I apologize if it came off that way. > Please always specify the systemd version in use. We're not all SLES > users, and even if we were, I assume that there might be different > package versions over time. Quite reasonable: localhost:/var/tmp # cat /etc/os-release NAME="SLES" VERSION="12-SP3" VERSION_ID="12.3" PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3" ID="sles" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:suse:sles:12:sp3" localhost:/var/tmp # rpm -q systemd systemd-228-142.1.x86_64 > > When we first spin up a new SLES12 host with our custom services, > > the number of connections to /run/systemd/private numbers in the > > mere hundreds. > That sounds wrong already. Please figure out what those connections > are. I'm afraid that you might have to do some debugging on your > own, since this issue doesn't seem easily reproducible. What tactics should I employ? All of those file handles to /run/systemd/private are owned by PID 1, and 'ss' implies there are no peers. 'strace' in pid shows messages are flowing, but that doesn't reveal the logic about how the connections get created or culled, nor who initiated them. On a box with ~500 of these file handles, I can see that many of them are hours or days old: localhost:/var/tmp # date Wed Jul 10 09:45:01 EDT 2019 # new ones localhost:/var/tmp # lsof -nP /run/systemd/private | awk '/systemd/ { sub(/u/, "", $4); print $4}' | ( cd /proc/1/fd; xargs ls -t --full-time ) | head -5 lrwx-- 1 root root 64 2019-07-10 09:45:05.211722809 -0400 561 -> socket:[1183838] lrwx-- 1 root root 64 2019-07-10 09:40:02.611726025 -0400 559 -> socket:[1173429] lrwx-- 1 root root 64 2019-07-10 09:40:02.611726025 -0400 560 -> socket:[1176265] lrwx-- 1 root root 64 2019-07-10 09:33:10.687730403 -0400 100 -> socket:[113992] lrwx-- 1 root root 64 2019-07-10 09:33:10.687730403 -0400 101 -> socket:[115163] xargs: ls: terminated by signal 13 # old ones localhost:/var/tmp # lsof -nP /run/systemd/private | awk '/systemd/ { sub(/u/, "", $4); print $4}' | ( cd /proc/1/fd; xargs ls -t --full-time ) | tail -5 lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 59 -> socket:[43097] lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 60 -> socket:[44029] lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 63 -> socket:[46234] lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 65 -> socket:[49252] lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 71 -> socket:[54064] > > Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private > > correct? > > Yes. The number is hardcoded because it's expected to be "large > enough". The connection count shouldn't be more than "a few" or maybe > a dozen at any time. Thanks for confirming that. > Zbyszek -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Tue, Jul 02, 2019 at 09:57:44AM -0400, Brian Reichert wrote: > At $JOB, on some of our SLES12 boxes, our logs are getting swamped > with messages saying: > > "Too many concurrent connections, refusing" Please always specify the systemd version in use. We're not all SLES users, and even if we were, I assume that there might be different package versions over time. > # ss -x | grep /run/systemd/private | wc -l > 4015 /run/systemd/private is used by systemctl and other systemd utilities when running as root. Those connections are expected to be short-lived. Generally, on a normal machine "ss -x | grep /run/systemd/private | wc -l" is expected to yield 0 or a very low number transiently. > But, despite the almost 4k connections, 'ss' shows that there are > no connected peers: > > # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l > 0 Interesting. ss output is not documented at all from what I can see, but indeed '* 0' seems to indicate that. It is possible that systemd has a leak and is not closing the private bus connections properly. > When we first spin up a new SLES12 host with our custom services, > the number of connections to /run/systemd/private numbers in the > mere hundreds. That sounds wrong already. Please figure out what those connections are. I'm afraid that you might have to do some debugging on your own, since this issue doesn't seem easily reproducible. (I installed systemd with CONNECTIONS_MAX set to 10, and I can easily saturate the number of available connections with for i in {1..11}; do systemctl status '*' & sleep 0.5; kill -STOP $!;done As soon as I allow the processes to continue or kill them, the connection count goes down. They never show up with '* 0'.) > Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private > correct? Yes. The number is hardcoded because it's expected to be "large enough". The connection count shouldn't be more than "a few" or maybe a dozen at any time. > I have a hypothesis that this may be some resource leak in systemd, > but I've not found a way to test that. Once you figure out what is creating the connection, it would be useful to attach strace to pid 1 and see what is happening there. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Tue, Jul 02, 2019 at 09:57:44AM -0400, Brian Reichert wrote: > At $JOB, on some of our SLES12 boxes, our logs are getting swamped > with messages saying: > > "Too many concurrent connections, refusing" > > It's hampering our ability to manage services, e.g.: > > # systemctl status ntpd > Failed to get properties: Connection reset by peer Can anyone at least confirm that the CONNECTIONS_MAX limit in dbus.c does relate to the number of connections systemd has to /run/systemd/private? We can't even reliably reboot the server in question under these circumstances... -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd's connections to /run/systemd/private ?
At $JOB, on some of our SLES12 boxes, our logs are getting swamped with messages saying: "Too many concurrent connections, refusing" It's hampering our ability to manage services, e.g.: # systemctl status ntpd Failed to get properties: Connection reset by peer Near as I can tell from a quick read of the source of dbus.c, we're hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096). I think this is related to the number of connections systemd (pid 1) has to /run/systemd/private, but I'm guessing here: # ss -x | grep /run/systemd/private | wc -l 4015 But, despite the almost 4k connections, 'ss' shows that there are no connected peers: # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l 0 The symptom here is that depending on system activity, systemd stops being able to process new requests. systemd allows requests to come in (e.g. via an invocation of 'systemctl', but if I understand the source of dbus.c, when there are too many connections to it's outgoing stream, systemd rejects the efforts, apparently with no retry. When we first spin up a new SLES12 host with our custom services, the number of connections to /run/systemd/private numbers in the mere hundreds. As workloads increase, the number of connections raises to the thousands. Some hosts are plagued with the 'Too many concurrent' connections, some are not. Empirically, all I've been able to see is that the number of systemd's connections to /run/systemd/private tips over 4k. Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private correct? - I can't demonstrate that there are any consumers of this stream. - I can't explain why the connection count increases over time. - The CONNECTION_MAX constant is hard-coded, and it gets increased every few months/years, but never seems to be expressed as something you can set in a config file. - I don't know what tunables affect the lifetime/culling of those connections. I have a hypothesis that this may be some resource leak in systemd, but I've not found a way to test that. -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel