Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-15 Thread Michael Chapman
On Fri, 16 Aug 2019, Uoti Urpala wrote:
> On Thu, 2019-08-15 at 20:36 +1000, Michael Chapman wrote:
> > With systemd 239 I was unable to cause an fd leak this way.
> > 
> > Still, I would feel more comfortable if I could find a commit that 
> > definitely fixed the problem. All of these experiments are just 
> > circumstantial evidence.
> 
> 5ae37ad833583e6c1c7765767b7f8360afca3b07
> sd-bus: when attached to an sd-event loop, disconnect on processing errors

Ah, nice. So since systemd v237.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-15 Thread Uoti Urpala
On Thu, 2019-08-15 at 20:36 +1000, Michael Chapman wrote:
> With systemd 239 I was unable to cause an fd leak this way.
> 
> Still, I would feel more comfortable if I could find a commit that 
> definitely fixed the problem. All of these experiments are just 
> circumstantial evidence.

5ae37ad833583e6c1c7765767b7f8360afca3b07
sd-bus: when attached to an sd-event loop, disconnect on processing errors


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-15 Thread Michael Chapman
On Thu, 15 Aug 2019, Lennart Poettering wrote:
> On Mi, 14.08.19 22:36, Michael Chapman (m...@very.puzzling.org) wrote:
> 
> > On Wed, 14 Aug 2019, Lennart Poettering wrote:
> > > Well, a D-Bus connection can remain open indefinitely, and may even
> > > have incomplete "half" messages queued in them as long as the client
> > > desires. After the initial authentication is done, clients may thus
> > > take up resources as long as they want, this is by design of dbus
> > > really, and is different from HTTP for example, where connections
> > > usually have a time-out applied. dbus doesn't know timeouts for
> > > established connections. It knows them for the authentication phase,
> > > and it knows them for method calls that are flight, but it does not
> > > know them for the mere existance of an established connection.
> >
> > I'm sure it's not in the design of DBus that clients can continue to
> > consume those resources after they've disconnected.
> >
> > > PID 1 authenticates clients of the private connection simply by making
> > > the socket for it inaccessible to anyone who is not privileged. Due to
> > > that it gets away with not doing any further per-user accounting,
> > > because it knows the clients are all privileged anyway.
> > >
> > > So, yes, it would be good if we could protect us from any form of
> > > misuse, but basically, if you have a root client that misbehaves you
> > > have too accept that...
> >
> > I understand all that. Nevertheless, Brian identified a bug: after
> > receiving certain data on its private socket, the systemd process can leak
> > a file descriptor.
> 
> Can it? Did I miss something? If the client closes the client side of
> the socket, but PID 1 would keep the server side of it open anyway,
> then that would be a bug indeed. But my understanding was that the
> client side stays pinned?

I was able to reproduce the bug on CentOS 7's systemd 219. That is, the 
file descriptor in PID 1 was dropped from its epoll set without it 
reaching EOF and without it being closed. Every time I ran Brian's command 
PID 1 would leak another file descriptor.

I was unable to reproduce this on a later version of systemd, but that 
_could_ just be because this later version of systemd side-steps the issue 
by ensuring that systemctl doesn't use fd 1 for its socket.

I have some reason to believe the problem in PID 1 has been fixed though. 
On CentOS 7 I was able to cause it to sometimes leak an fd simply by 
sending random data to it:

  # count-sockets() { ss -x | grep /run/systemd/private | wc -l; }
  # inject-junk() { timeout 1s nc -U /run/systemd/private https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-15 Thread Lennart Poettering
On Mi, 14.08.19 22:36, Michael Chapman (m...@very.puzzling.org) wrote:

> On Wed, 14 Aug 2019, Lennart Poettering wrote:
> > Well, a D-Bus connection can remain open indefinitely, and may even
> > have incomplete "half" messages queued in them as long as the client
> > desires. After the initial authentication is done, clients may thus
> > take up resources as long as they want, this is by design of dbus
> > really, and is different from HTTP for example, where connections
> > usually have a time-out applied. dbus doesn't know timeouts for
> > established connections. It knows them for the authentication phase,
> > and it knows them for method calls that are flight, but it does not
> > know them for the mere existance of an established connection.
>
> I'm sure it's not in the design of DBus that clients can continue to
> consume those resources after they've disconnected.
>
> > PID 1 authenticates clients of the private connection simply by making
> > the socket for it inaccessible to anyone who is not privileged. Due to
> > that it gets away with not doing any further per-user accounting,
> > because it knows the clients are all privileged anyway.
> >
> > So, yes, it would be good if we could protect us from any form of
> > misuse, but basically, if you have a root client that misbehaves you
> > have too accept that...
>
> I understand all that. Nevertheless, Brian identified a bug: after
> receiving certain data on its private socket, the systemd process can leak
> a file descriptor.

Can it? Did I miss something? If the client closes the client side of
the socket, but PID 1 would keep the server side of it open anyway,
then that would be a bug indeed. But my understanding was that the
client side stays pinned?

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Dave Howorth
On Wed, 14 Aug 2019 12:04:03 -0400
Brian Reichert  wrote:
> On Wed, Aug 14, 2019 at 04:19:46PM +0100, Simon McVittie wrote:
> > On Wed, 14 Aug 2019 at 10:26:53 -0400, Brian Reichert wrote:
> > Doesn't daemonize(1) make stdin, stdout and stderr point
> > to /dev/null, instead of closing them?  
> 
> Looking at the source, yes, it does.
> 
> > Expecting arbitrary subprocesses to cope gracefully with being
> > invoked without the three standard fds seems likely to be a losing
> > battle. I've implemented this myself, in dbus; it isn't a whole lot
> > of code, but it also isn't something that I would expect the
> > authors of all CLI tools to get right.  
> 
> I concede that reopening FD 0,1,2 is a good practice to insulate
> against the issues you cite.
> 
> I agree with your points; I code aggressively, and sometimes forget
> others don't.

I didn't know what you meant by this. Do you mean 'Aggressive
Programming'? Is
https://www.apharmony.com/software-sagacity/2014/10/principles-of-aggressive-programming/
a reasonable summary?

> > smcv
> > 
> > [1] I'm sure there are lots of other executables named daemon or
> > daemonize in other OSs, and perhaps some of them get this wrong?
> 
> -- 
> Brian Reichert
> BSD admin/developer at large  
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Brian Reichert
On Wed, Aug 14, 2019 at 04:19:46PM +0100, Simon McVittie wrote:
> On Wed, 14 Aug 2019 at 10:26:53 -0400, Brian Reichert wrote:
> Doesn't daemonize(1) make stdin, stdout and stderr point to /dev/null,
> instead of closing them?

Looking at the source, yes, it does.

> Expecting arbitrary subprocesses to cope gracefully with being invoked
> without the three standard fds seems likely to be a losing battle.
> I've implemented this myself, in dbus; it isn't a whole lot of code,
> but it also isn't something that I would expect the authors of all CLI
> tools to get right.

I concede that reopening FD 0,1,2 is a good practice to insulate
against the issues you cite.

I agree with your points; I code aggressively, and sometimes forget
others don't.

> smcv
> 
> [1] I'm sure there are lots of other executables named daemon or daemonize
> in other OSs, and perhaps some of them get this wrong?
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Simon McVittie
On Wed, 14 Aug 2019 at 10:26:53 -0400, Brian Reichert wrote:
> And, if I were to ever use daemonize(1), or any other other canonical
> mechanism for daemonizing code, STDOUT would normally be closed
> under those circumstances, as well.

Doesn't daemonize(1) make stdin, stdout and stderr point to /dev/null,
instead of closing them? That's what glibc daemon(3) does, and seems
a lot better as a way to not break subprocesses' expectations. The man
pages of the daemon(1) and daemonize(1) in Debian[1] also both say they
do this, and it's what is recommended in systemd's own daemon(7).

Expecting arbitrary subprocesses to cope gracefully with being invoked
without the three standard fds seems likely to be a losing battle.
I've implemented this myself, in dbus; it isn't a whole lot of code,
but it also isn't something that I would expect the authors of all CLI
tools to get right. dbus only does this because X11 autolaunching can
lead to dbus-daemon being invoked in a broken execution environment, and
I would have considered the equivalent bug in most tools to be WONTFIX.

Expecting arbitrary *library code* to cope gracefully with being invoked
without the three standard fds seems like a non-starter, because these
fds are process-global state, so library code cannot safely mess with
them unless specifically asked to do so by the calling process.

In general library code can't do this safely anyway, because as soon as
the process has a second thread, it becomes impossible to fix the fds
in one thread without risking breaking another thread - this is one of
the operations that is usually only safe to do right at the beginning
of main(), similar to setenv().

smcv

[1] I'm sure there are lots of other executables named daemon or daemonize
in other OSs, and perhaps some of them get this wrong?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Lennart Poettering
On Mi, 14.08.19 10:26, Brian Reichert (reich...@numachi.com) wrote:

> On Wed, Aug 14, 2019 at 11:34:21AM +0200, Lennart Poettering wrote:
> > Hence: your code that closes fd1 like this is simply buggy. Don't do
> > that, you are shooting yourself in the foot.
>
> Buggy or no, this is fifteen-year-old code, and prior cron/service
> mgmt framework implementations had no issue.
>
> And, if I were to ever use daemonize(1), or any other other canonical
> mechanism for daemonizing code, STDOUT would normally be closed
> under those circumstances, as well.

In daemons, stdin/stdout/stderr are generally closed and then opened
as /dev/null again, precisely to avoid problems like that, so that
fd0,1,2 are taken and any future fd allocations are > 2. This is what
BSD/glibc daemon(3) does for example, and most daemons do it like
that.

printf() and friends assume that fd1 is stdout.
fgets() and friends assume that fd0 is stdin.
error() and friends assume that fd2 is stderr.

Most command line programs (such as ls, echo or systemctl assume that
fd0, 1, 2 are defined that way, and they will do their output and
input according to that. if you close these fds before invoking these
programs they will do borked stuff too...

Example:

   telnet www.microsoft.de 80 2>&1-

See, this will connect to port 80, and do so with a socket which is
allocated as fd1, because we called this tool with fd1 closed. Thus it
interferes with stdout, and that one line string that says "Trying
…\nConnected to …\r\nEscape character is '^]'" will be written to the
peer instead of the local tty!

That works with any tool. If you invoke a tool with
stdout/stderr/stdin not connected weird shit happens as soon as the
app allocates the first fd. systemctl is just one of many.

Note that the robustness we added to sd-bus which explicitly moves the
fd >= 3 is not supposed to make things work in cases where systemctl
is called with closed stdin, stdout or stderr. Instead, we made this
change since the sd-bus code is used in NSS clients, i.e. as backend
for gethostbyname(), and thus we really don't know in which kind of
contexts it is used. i.e. it might very well be called inside a
process that is currently rearranging stdin/stdout/stderr and thus
might have closed them temporarily.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Brian Reichert
On Wed, Aug 14, 2019 at 11:34:21AM +0200, Lennart Poettering wrote:
> Hence: your code that closes fd1 like this is simply buggy. Don't do
> that, you are shooting yourself in the foot.

Buggy or no, this is fifteen-year-old code, and prior cron/service
mgmt framework implementations had no issue.

And, if I were to ever use daemonize(1), or any other other canonical
mechanism for daemonizing code, STDOUT would normally be closed
under those circumstances, as well.

I'm wading into hypotheticals here, but daemonized code should, in
turn, be able to invoke whatever code is subsequently wants.

> Lennart
> 
> --
> Lennart Poettering, Berlin

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Michael Chapman
On Wed, 14 Aug 2019, Lennart Poettering wrote:
> Well, a D-Bus connection can remain open indefinitely, and may even
> have incomplete "half" messages queued in them as long as the client
> desires. After the initial authentication is done, clients may thus
> take up resources as long as they want, this is by design of dbus
> really, and is different from HTTP for example, where connections
> usually have a time-out applied. dbus doesn't know timeouts for
> established connections. It knows them for the authentication phase,
> and it knows them for method calls that are flight, but it does not
> know them for the mere existance of an established connection.

I'm sure it's not in the design of DBus that clients can continue to 
consume those resources after they've disconnected.

> PID 1 authenticates clients of the private connection simply by making
> the socket for it inaccessible to anyone who is not privileged. Due to
> that it gets away with not doing any further per-user accounting,
> because it knows the clients are all privileged anyway.
> 
> So, yes, it would be good if we could protect us from any form of
> misuse, but basically, if you have a root client that misbehaves you
> have too accept that...

I understand all that. Nevertheless, Brian identified a bug: after 
receiving certain data on its private socket, the systemd process can leak 
a file descriptor.

All we need to establish is whether that bug still exists in the current 
version of systemd. _Even if_ it isn't a security issue, having one fewer 
bugs is a good thing.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Lennart Poettering
On Mi, 14.08.19 19:47, Michael Chapman (m...@very.puzzling.org) wrote:

> On Wed, 14 Aug 2019, Lennart Poettering wrote:
> > Quite frankly, invoking generic UNIX programs with fds < 3 closed is a
> > really bad idea in general. That systemctl nowadays is particularly
> > careful and deals with situations like that is not an invitation to
> > actually invoke things like this. After all UNIX guarantees that
> > open() (and the other calls that allocate an fd) allocated the lowest
> > available one, hence they will necessarily step into
> > stdin/stdout/stderr territory when invoked with any of those fds
> > closed.
> >
> > Hence: your code that closes fd1 like this is simply buggy. Don't do
> > that, you are shooting yourself in the foot. And even if newer
> > systemctl will handle cases like this more gracefully pretty much any
> > other tool you might call will break in some form or another too,
> > since a simple printf() will already spill into the wrong fds in that
> > case.
>
> That's all true, but the thing we need to check here is that systemd
> correctly handles junk on the /run/systemd/private socket. The change on
> the systemctl side certainly tries to prevent incorrect data being sent
> down the socket -- though it looks like there's several ways in which
> fd_move_above_stdio() can fail, so this isn't foolproof -- but we need to
> ensure that some _malicious_ client can't DoS systemd.

Well, a D-Bus connection can remain open indefinitely, and may even
have incomplete "half" messages queued in them as long as the client
desires. After the initial authentication is done, clients may thus
take up resources as long as they want, this is by design of dbus
really, and is different from HTTP for example, where connections
usually have a time-out applied. dbus doesn't know timeouts for
established connections. It knows them for the authentication phase,
and it knows them for method calls that are flight, but it does not
know them for the mere existance of an established connection.

PID 1 authenticates clients of the private connection simply by making
the socket for it inaccessible to anyone who is not privileged. Due to
that it gets away with not doing any further per-user accounting,
because it knows the clients are all privileged anyway.

So, yes, it would be good if we could protect us from any form of
misuse, but basically, if you have a root client that misbehaves you
have too accept that...

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Michael Chapman
On Wed, 14 Aug 2019, Lennart Poettering wrote:
> Quite frankly, invoking generic UNIX programs with fds < 3 closed is a
> really bad idea in general. That systemctl nowadays is particularly
> careful and deals with situations like that is not an invitation to
> actually invoke things like this. After all UNIX guarantees that
> open() (and the other calls that allocate an fd) allocated the lowest
> available one, hence they will necessarily step into
> stdin/stdout/stderr territory when invoked with any of those fds
> closed.
> 
> Hence: your code that closes fd1 like this is simply buggy. Don't do
> that, you are shooting yourself in the foot. And even if newer
> systemctl will handle cases like this more gracefully pretty much any
> other tool you might call will break in some form or another too,
> since a simple printf() will already spill into the wrong fds in that
> case.
> 
> Lennart

That's all true, but the thing we need to check here is that systemd 
correctly handles junk on the /run/systemd/private socket. The change on 
the systemctl side certainly tries to prevent incorrect data being sent 
down the socket -- though it looks like there's several ways in which 
fd_move_above_stdio() can fail, so this isn't foolproof -- but we need to 
ensure that some _malicious_ client can't DoS systemd.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Lennart Poettering
On Mi, 14.08.19 18:53, Michael Chapman (m...@very.puzzling.org) wrote:

> On Wed, 14 Aug 2019, Michael Chapman wrote:
> > On Wed, 14 Aug 2019, Brian Reichert wrote:
> > > On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek 
> > > wrote:
> > > > Yes. (With the caveat that there *are* legitimate reasons to have new
> > > > long-lived fds created, so not every long-lived fd is "wrong".)
> > >
> > > I finally was able to track down what's happening on my system.
> > >
> > > This is sufficient to reproduce the effect of increasing the number
> > > of file descriptors open to /run/systemd/private; at least, on my
> > > box, in it's current state:
> > >
> > >   sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service'
> >
> > I can reproduce this on CentOS 7's systemd 219, but not on Fedora 29's
> > systemd 239.
>
> Looking in to this further, my guess is that this is fixed in:
>
>   commit 7fe2903c2315db9d9f501ae255a6b6e4f01ba46c
>   Author: Lennart Poettering 
>   Date:   Fri Feb 9 17:53:28 2018 +0100
>
>   fd-util: move certain fds above fd #2 (#8129)
>
> which was introduced in systemd 238.
>
> What's happening is that systemctl is getting fd 1 for the socket to
> /run/systemd/private, as it's the lowest unused fd, but fd 1 (aka stdout)
> is also where it sends its regular human-readable output. This commit
> ensures that the socket is moved to a file descriptor that can't be
> mistaken for a standard stream.
>
> I'm not sure if systemd >= 238 itself still suffers from the problem where
> any "extra" data on that connection causes it to leak its file descriptor.
> That may still be a problem, or it may have been fixed by a different
> commit.

Quite frankly, invoking generic UNIX programs with fds < 3 closed is a
really bad idea in general. That systemctl nowadays is particularly
careful and deals with situations like that is not an invitation to
actually invoke things like this. After all UNIX guarantees that
open() (and the other calls that allocate an fd) allocated the lowest
available one, hence they will necessarily step into
stdin/stdout/stderr territory when invoked with any of those fds
closed.

Hence: your code that closes fd1 like this is simply buggy. Don't do
that, you are shooting yourself in the foot. And even if newer
systemctl will handle cases like this more gracefully pretty much any
other tool you might call will break in some form or another too,
since a simple printf() will already spill into the wrong fds in that
case.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Michael Chapman
On Wed, 14 Aug 2019, Michael Chapman wrote:
> On Wed, 14 Aug 2019, Brian Reichert wrote:
> > On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek 
> > wrote:
> > > Yes. (With the caveat that there *are* legitimate reasons to have new
> > > long-lived fds created, so not every long-lived fd is "wrong".)
> > 
> > I finally was able to track down what's happening on my system.
> > 
> > This is sufficient to reproduce the effect of increasing the number
> > of file descriptors open to /run/systemd/private; at least, on my
> > box, in it's current state:
> > 
> >   sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service'
> 
> I can reproduce this on CentOS 7's systemd 219, but not on Fedora 29's 
> systemd 239.

Looking in to this further, my guess is that this is fixed in:

  commit 7fe2903c2315db9d9f501ae255a6b6e4f01ba46c
  Author: Lennart Poettering 
  Date:   Fri Feb 9 17:53:28 2018 +0100

  fd-util: move certain fds above fd #2 (#8129)

which was introduced in systemd 238.

What's happening is that systemctl is getting fd 1 for the socket to 
/run/systemd/private, as it's the lowest unused fd, but fd 1 (aka stdout) 
is also where it sends its regular human-readable output. This commit 
ensures that the socket is moved to a file descriptor that can't be 
mistaken for a standard stream.

I'm not sure if systemd >= 238 itself still suffers from the problem where 
any "extra" data on that connection causes it to leak its file descriptor. 
That may still be a problem, or it may have been fixed by a different 
commit.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-14 Thread Michael Chapman
On Wed, 14 Aug 2019, Brian Reichert wrote:
> On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek wrote:
> > Yes. (With the caveat that there *are* legitimate reasons to have new
> > long-lived fds created, so not every long-lived fd is "wrong".)
> 
> I finally was able to track down what's happening on my system.
> 
> This is sufficient to reproduce the effect of increasing the number
> of file descriptors open to /run/systemd/private; at least, on my
> box, in it's current state:
> 
>   sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service'

I can reproduce this on CentOS 7's systemd 219, but not on Fedora 29's 
systemd 239.

On CentOS 7 I took two `strace -e desc -p 1` runs, comparing:

  # good
  sh -c 'exec 1>/dev/null; systemctl status tmp.mount'

with:

  # bad
  sh -c 'exec 1>&-; systemctl status tmp.mount'

The diff is:

  # diff -u /tmp/good /tmp/bad
  --- /tmp/good 2019-08-14 18:11:20.100792406 +1000
  +++ /tmp/bad  2019-08-14 18:11:24.882886972 +1000
  @@ -6,7 +6,7 @@
   fstat(24, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
   epoll_ctl(4, EPOLL_CTL_ADD, 24, {0, {u32=107062384, u64=94382013391984}}) = 0
   epoll_ctl(4, EPOLL_CTL_MOD, 24, {EPOLLIN|EPOLLOUT, {u32=107062384, 
u64=94382013391984}}) = 0
  -timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, 
it_value={1136546, 444853000}}, NULL) = 0
  +timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, 
it_value={1136551, 444853000}}, NULL) = 0
   epoll_wait(4, [{EPOLLOUT, {u32=107062384, u64=94382013391984}}], 58, -1) = 1
   epoll_ctl(4, EPOLL_CTL_MOD, 24, {EPOLLIN, {u32=107062384, 
u64=94382013391984}}) = 0
   epoll_wait(4, [{EPOLLIN, {u32=107062384, u64=94382013391984}}], 58, -1) = 1
  @@ -23,13 +23,7 @@
   openat(AT_FDCWD, "/usr/local/lib/systemd/system/tmp.mount.d", 
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or 
directory)
   openat(AT_FDCWD, "/usr/lib/systemd/system/tmp.mount.d", 
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or 
directory)
   openat(AT_FDCWD, "/run/systemd/generator.late/tmp.mount.d", 
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or 
directory)
  -epoll_wait(4, [{EPOLLIN|EPOLLHUP, {u32=107062384, u64=94382013391984}}], 58, 
-1) = 1
  -epoll_ctl(4, EPOLL_CTL_MOD, 24, {0, {u32=107062384, u64=94382013391984}}) = 0
  -timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 1}}, 
NULL) = 0
  -epoll_wait(4, [{EPOLLHUP, {u32=107062384, u64=94382013391984}}, {EPOLLIN, 
{u32=3, u64=3}}], 58, -1) = 2
  -read(3, "\1\0\0\0\0\0\0\0", 8)  = 8
  +epoll_wait(4, [{EPOLLIN, {u32=107062384, u64=94382013391984}}], 58, -1) = 1
   epoll_ctl(4, EPOLL_CTL_DEL, 24, NULL)   = 0
  -close(24)   = 0
  -timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, 
it_value={1136622, 194853000}}, NULL) = 0
   epoll_wait(4, strace: Process 1 detached
  

So it looks like systemd is removing the file descriptor from the epoll 
instance in both cases. However, in the "bad" case no EOF was reached, and 
the file descriptor is never closed. Looking at `ss` there is still 387 
bytes in this descriptor's receive queue.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-13 Thread Brian Reichert
On Thu, Aug 01, 2019 at 07:18:20PM +, Zbigniew J??drzejewski-Szmek wrote:
> Yes. (With the caveat that there *are* legitimate reasons to have new
> long-lived fds created, so not every long-lived fd is "wrong".)

I finally was able to track down what's happening on my system.

This is sufficient to reproduce the effect of increasing the number
of file descriptors open to /run/systemd/private; at least, on my
box, in it's current state:

  sh -c 'exec 1>&-; /usr/bin/systemctl status ntpd.service'

We have cronjob that closes STDOUT, remaps STDERR to a log file,
and runs this systemctl command.  In my environment, this one-liner
will cause that FD count to go up by, 100% reproducible.

Somehow, closing STDOUT is necessary to see this.

FWIW, the strace effort didn't yeild anything; instead, I configured
auditd to reveal when systemctl was invoked, and found a pattern
of invocations I was able to backtrack to the cronjob.

> Zbyszek

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-01 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Aug 01, 2019 at 08:46:58AM -0400, Brian Reichert wrote:
> On Thu, Aug 01, 2019 at 08:17:01AM +, Zbigniew J??drzejewski-Szmek wrote:
> > The kernel will use the lower-numbered available fd, so there's lot of
> > "reuse" of the same numbers happening. This strace means that between
> > each of those close()s here, some other function call returned fd 19.
> > Until we know what those calls are, we cannot say why fd19 remains
> > open. (In fact, the only thing we can say for sure, is that the
> > accept4() call shown above is not relevant.)
> 
> So, what I propose at this step:
> 
> - Restart my strace, this time using '-e trace=desc' (Trace all
>   file descriptor related system calls.)
> 
> - Choose to focus on a single descriptor; when I passively notice
>   that '19' has been reused a couple of time, stop the trace.

Well, no. If you notice that '19' has STOPPED being reused, then
stop the trace. If it is being reused, then it's it's not "leaked".

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-01 Thread Brian Reichert
On Thu, Aug 01, 2019 at 08:17:01AM +, Zbigniew J??drzejewski-Szmek wrote:
> The kernel will use the lower-numbered available fd, so there's lot of
> "reuse" of the same numbers happening. This strace means that between
> each of those close()s here, some other function call returned fd 19.
> Until we know what those calls are, we cannot say why fd19 remains
> open. (In fact, the only thing we can say for sure, is that the
> accept4() call shown above is not relevant.)

So, what I propose at this step:

- Restart my strace, this time using '-e trace=desc' (Trace all
  file descriptor related system calls.)

- Choose to focus on a single descriptor; when I passively notice
  that '19' has been reused a couple of time, stop the trace.

That should give me a smaller trace to analyze.

> Zbyszek

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-08-01 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Jul 31, 2019 at 11:37:31AM -0400, Brian Reichert wrote:
> On Wed, Jul 31, 2019 at 12:36:41AM +0300, Uoti Urpala wrote:
> > On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote:
> > > I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls
> > > for close(), e.g.:
> > > 
> > >   13:50:01 close(19)  = 0
> > > 
> > > Followed by getsockopt(), and a received message on the supposedly-closed
> > > file descriptor:
> > > 
> > >   13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, 
> > > gid=0}, [12]) = 0
> > 
> > Are you sure it's the same file descriptor? You don't explicitly say
> > anything about there not being any relevant lines between those. Does
> > systemd really just call getsockopt() on fd 19 after closing it, with
> > nothing to trigger that? Obvious candidates to check in the strace
> > would be an accept call returning a new fd 19, or epoll indicating
> > activity on the fd (though I'd expect systemd to remove the fd from the
> > epoll set after closing it).
> 
> My analysis is naive.
> 
> There was an earlier suggestion to use strace, limiting it to a
> limited number of system calls.
> 
> I then used a simple RE to look for the string '(19', to see calls where
> '19' was used as an initial argument to system calls.  That's way too
> simplistic.
> 
> To address some of your questions/points.
> 
> - No, I don't know if it's the same file descriptor.  I could not
>   start strace early enough to catch the creation of several dozen
>   file descriptors.

This shouldn't really matter. We care about descriptors which are
created while the process is running, so it is not a problem if we
"miss" some that are created early.

>   13:50:01 accept4(13, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 19
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
> [12]) = 0
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
> [12]) = 0
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 13:50:01 
> getsockname(19, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) 
> = 0 13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 
> 30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, 
> msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45
>   13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 
> 9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, 
> {NULL, 0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 
> 13:50:01 sendmsg(19, {msg_name(0)=NULL, 
> msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0
>  
> \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0",
>  128}, 
> {"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0",
>  80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE 
> (Broken pipe)
>   13:50:01 close(19)  = 0
>   13:50:01 close(19)  = 0
>   13:50:01 close(19)  = 0
>   13:50:01 close(19)  = 0
>   13:50:01 close(19)  = 0
>   13:50:01 close(19)  = 0
>   13:50:01 close(19)  = 0

The kernel will use the lower-numbered available fd, so there's lot of
"reuse" of the same numbers happening. This strace means that between
each of those close()s here, some other function call returned fd 19.
Until we know what those calls are, we cannot say why fd19 remains
open. (In fact, the only thing we can say for sure, is that the
accept4() call shown above is not relevant.)

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-31 Thread Brian Reichert
On Wed, Jul 31, 2019 at 12:36:41AM +0300, Uoti Urpala wrote:
> On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote:
> > I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls
> > for close(), e.g.:
> > 
> > 13:50:01 close(19)  = 0
> > 
> > Followed by getsockopt(), and a received message on the supposedly-closed
> > file descriptor:
> > 
> >   13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, 
> > gid=0}, [12]) = 0
> 
> Are you sure it's the same file descriptor? You don't explicitly say
> anything about there not being any relevant lines between those. Does
> systemd really just call getsockopt() on fd 19 after closing it, with
> nothing to trigger that? Obvious candidates to check in the strace
> would be an accept call returning a new fd 19, or epoll indicating
> activity on the fd (though I'd expect systemd to remove the fd from the
> epoll set after closing it).

My analysis is naive.

There was an earlier suggestion to use strace, limiting it to a
limited number of system calls.

I then used a simple RE to look for the string '(19', to see calls where
'19' was used as an initial argument to system calls.  That's way too
simplistic.

To address some of your questions/points.

- No, I don't know if it's the same file descriptor.  I could not
  start strace early enough to catch the creation of several dozen
  file descriptors.

- I didn't say anything about lines between those that I cited, as
  I could not ascertain relevance.

- And I completely missed the case of the accept4() calls returning
  the value of 19, among other cases where '19' shows up as a value.

A regex-based search is certainly inconclusive, but now I'm using this:

  egrep -e '[^0-9:]19(\)|\}|\]|,)?' /home/systemd.strace.trimmed | less

The rhythm now seems to be more like this:

  13:50:01 accept4(13, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 19
  13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
[12]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
[12]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 13:50:01 
getsockname(19, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) = 
0 13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 
30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, 
msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45
  13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 
9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 
0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 13:50:01 
sendmsg(19, {msg_name(0)=NULL, 
msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0
 
\0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0",
 128}, 
{"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0",
 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE 
(Broken pipe)
  13:50:01 close(19)  = 0
  13:50:01 close(19)  = 0
  13:50:01 close(19)  = 0
  13:50:01 close(19)  = 0
  13:50:01 close(19)  = 0
  13:50:01 close(19)  = 0
  13:50:01 close(19)  = 0
  ...

Mind you, I see a _lot_ more close() calls than accepts():

  localhost:~ # egrep -e '[^0-9:]19(\)|\}|\]|,)?' /home/systemd.strace.trimmed 
> /home/systemd.strace.trimmed.19
  localhost:~ # grep accept4\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 
2- | sort -u
  accept4(13, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 19
  localhost:~ # grep accept4\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 
2- | wc -l
  55
  localhost:~ # grep close\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 2- 
| sort -u
  close(19)  = 0
  localhost:~ # grep close\( /home/systemd.strace.trimmed.19 | cut -d' ' -f 2- 
| wc -l
  1051

I'm not asserting the frequencies are indicative of anything wrong;
I'm just more used to a 1:1 correlation.

And, again, the age of the file in /proc/1/fd/19 never seems to
change:

  localhost:~ # ls -ld --full-time /proc/1/fd/19
  lrwx-- 1 root root 64 2019-07-30 15:45:25.531468318 -0400 /proc/1/fd/19
  -> socket:[27085]
  localhost:~ # date
  Wed Jul 31 11:31:37 EDT 2019

That may be a red herring. I have been assuming that if an FD was
closed, then reopened/recreated by a process, that file would have
a new age.





> 
> 
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel

-- 
Brian Reichert  
BSD admin/developer 

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-30 Thread Uoti Urpala
On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote:
> I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls
> for close(), e.g.:
> 
>   13:50:01 close(19)  = 0
> 
> Followed by getsockopt(), and a received message on the supposedly-closed
> file descriptor:
> 
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
> [12]) = 0

Are you sure it's the same file descriptor? You don't explicitly say
anything about there not being any relevant lines between those. Does
systemd really just call getsockopt() on fd 19 after closing it, with
nothing to trigger that? Obvious candidates to check in the strace
would be an accept call returning a new fd 19, or epoll indicating
activity on the fd (though I'd expect systemd to remove the fd from the
epoll set after closing it).


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-30 Thread Brian Reichert
On Thu, Jul 11, 2019 at 08:35:38PM +, Zbigniew J??drzejewski-Szmek wrote:
> On Thu, Jul 11, 2019 at 10:08:43AM -0400, Brian Reichert wrote:
> > Does that sound like expected behavior?
> 
> No, this shouldn't happen.
> 
> What I was trying to say, is that if you have the strace log, you
> can figure out what created the stale connection and what the dbus
> call was, and from all that info it should be fairly simply to figure
> out what the calling command was. Once you have that, it'll be much
> easier to reproduce the issue in controlled setting and look for the
> fix.

I'm finally revisiting this. I haven't found a way to get a trace
to start early enough to catch the initial open() on all of the
targeted file descriptors, but I'm trying to make do with what I
have.

To sum up, in my naive analysis, I see close() called many times
on a file descriptor. I then see more messages come in on that same
descriptor.  But the timestamp of the descriptor in /proc never
changes.

I created a service to launch strace as early as I can figure:

  localhost:~ # cat /usr/lib/systemd/system/systemd_strace.service
  [Unit]
  Description=strace systemd
  DefaultDependencies=no
  After=local-fs.target
  Before=sysinit.target
  ConditionPathExists=!/etc/initrd-release
  
  [Service]
  ExecStart=/usr/bin/strace -p1 -t -o /home/systemd.strace -e
  recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999
  ExecStop=/bin/echo systemd_strace.service will soon exit
  Type=simple
  
  [Install]
  WantedBy=multi-user.target
  
I introduced the '-t' flag, so I'd get timestamps on the recorded
entries.

I rebooted the server, and after ~20 minutes, I found stale
descriptors, that seem to date to when the host first booted.

Note the age of them, relative to the boot time, and they have no
connected peers.

  localhost:~ # uptime
   14:10pm  up   0:21,  3 users,  load average: 0.81, 0.24, 0.15
  localhost:~ # date
  Tue Jul 30 14:10:09 EDT 2019
  localhost:~ # lsof -nP /run/systemd/private | awk '/systemd/ { sub(/u/, "",
  $4); print $4}' | ( cd /proc/1/fd; xargs ls -t --full-time ) | tail -5
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 14 -> 
socket:[28742]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 16 -> 
socket:[35430]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 17 -> 
socket:[37758]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 18 -> 
socket:[41044]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 19 -> 
socket:[43411]
  localhost:~ # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
  0

This is an XFS filesystem, so I can't directly get the creation
time of my trace file, but I can see the first entry is timestamped
'13:49:07'.

I copied the trace file aside, and edited that copy to trim everything
off after 14:10:09, when I ran that 'date' command above.

As early as I tried to start this trace, dozens of file descriptors
had already been created.

Trying to focus on FD 19 (the oldest connection to /run/systemd/private):

I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls
for close(), e.g.:

13:50:01 close(19)  = 0

Followed by getsockopt(), and a received message on the supposedly-closed
file descriptor:

  13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
[12]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
[12]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
  13:50:01 getsockname(19, {sa_family=AF_LOCAL, 
sun_path="/run/systemd/private"}, [23]) = 0
  13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 
30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, 
msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45
  13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 
9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 
0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 13:50:01 
sendmsg(19, {msg_name(0)=NULL, 
msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0\0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0",
 128}, 
{"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0",
 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE 
(Broken pipe)

I see a continuous stream of messages coming in on FD 19, though
the end of the trace, but the age of the file descriptor in /proc
never seems to change.

Am I misinterpreting something?

> Zbyszek

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-11 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Jul 11, 2019 at 10:08:43AM -0400, Brian Reichert wrote:
> On Wed, Jul 10, 2019 at 10:44:14PM +, Zbigniew J??drzejewski-Szmek wrote:
> > That's ancient... 228 was released almost four years ago.
> 
> That's the joy of using a commercial Linux distribution; they tend
> to be conservative about updates.  SLES may very well have backported
> fixes to the packaged version they maintain.
> 
> They may also have a newer version of a systemd RPM for us to take.
> 
> I'm looking for an efficient way to repro the symptoms, as to confirm
> whether a newer RPM solves this for us.
> 
> > > > > When we first spin up a new SLES12 host with our custom services,
> > > > > the number of connections to /run/systemd/private numbers in the
> > > > > mere hundreds. 
> > > 
> > > > That sounds wrong already. Please figure out what those connections
> > > > are. I'm afraid that you might have to do some debugging on your
> > > > own, since this issue doesn't seem easily reproducible.
> 
> Above, I cite a desire for reproducing the symptoms.  If you're
> confident that a newly-spun-up idle host should not hover at hundreds
> of connections, then hypothetically I could update the vendor-provided
> systemd RPM (if there is one), reboot, and see if the connection
> count is reduced.
> 
> > strace -p1 -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999
> >
> > yields the relevant info. In particular, the pid, uid, and guid of the
> > remote is shown. My approach would be to log this to some file, and
> > then see which fds remain, and then look up this fd in the log.
> > The recvmsg calls contain the serialized dbus calls, a bit messy but
> > understandable. E.g. 'systemctl show systemd-udevd' gives something
> > like this:
> 
> Thanks for such succinct feedback; I'll see what I can get from this.
> 
> In my prior email, I showed how some of the connections were
> hours/days old, even with no connecting peer.
> 
> Does that sound like expected behavior?

No, this shouldn't happen.

What I was trying to say, is that if you have the strace log, you
can figure out what created the stale connection and what the dbus
call was, and from all that info it should be fairly simply to figure
out what the calling command was. Once you have that, it'll be much
easier to reproduce the issue in controlled setting and look for the
fix.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-11 Thread Brian Reichert
On Wed, Jul 10, 2019 at 10:44:14PM +, Zbigniew J??drzejewski-Szmek wrote:
> That's ancient... 228 was released almost four years ago.

That's the joy of using a commercial Linux distribution; they tend
to be conservative about updates.  SLES may very well have backported
fixes to the packaged version they maintain.

They may also have a newer version of a systemd RPM for us to take.

I'm looking for an efficient way to repro the symptoms, as to confirm
whether a newer RPM solves this for us.

> > > > When we first spin up a new SLES12 host with our custom services,
> > > > the number of connections to /run/systemd/private numbers in the
> > > > mere hundreds. 
> > 
> > > That sounds wrong already. Please figure out what those connections
> > > are. I'm afraid that you might have to do some debugging on your
> > > own, since this issue doesn't seem easily reproducible.

Above, I cite a desire for reproducing the symptoms.  If you're
confident that a newly-spun-up idle host should not hover at hundreds
of connections, then hypothetically I could update the vendor-provided
systemd RPM (if there is one), reboot, and see if the connection
count is reduced.

> strace -p1 -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999
>
> yields the relevant info. In particular, the pid, uid, and guid of the
> remote is shown. My approach would be to log this to some file, and
> then see which fds remain, and then look up this fd in the log.
> The recvmsg calls contain the serialized dbus calls, a bit messy but
> understandable. E.g. 'systemctl show systemd-udevd' gives something
> like this:

Thanks for such succinct feedback; I'll see what I can get from this.

In my prior email, I showed how some of the connections were
hours/days old, even with no connecting peer.

Does that sound like expected behavior?

> HTH,
> Zbyszek

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-10 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Jul 10, 2019 at 09:51:36AM -0400, Brian Reichert wrote:
> On Wed, Jul 10, 2019 at 07:37:19AM +, Zbigniew J??drzejewski-Szmek wrote:
> 
> > It's a bug report as any other. Writing a meaningful reply takes time
> > and effort. Lack of time is a much better explanation than ressentiments.
> 
> I wasn't expressing resentment; I apologize if it came off that way.
> 
> > Please always specify the systemd version in use. We're not all SLES
> > users, and even if we were, I assume that there might be different
> > package versions over time.
> 
> Quite reasonable:
> 
>   localhost:/var/tmp # cat /etc/os-release
>   NAME="SLES"
>   VERSION="12-SP3"
>   VERSION_ID="12.3"
>   PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
>   ID="sles"
>   ANSI_COLOR="0;32"
>   CPE_NAME="cpe:/o:suse:sles:12:sp3"
> 
>   localhost:/var/tmp # rpm -q systemd
>   systemd-228-142.1.x86_64

That's ancient... 228 was released almost four years ago.

> > > When we first spin up a new SLES12 host with our custom services,
> > > the number of connections to /run/systemd/private numbers in the
> > > mere hundreds. 
> 
> > That sounds wrong already. Please figure out what those connections
> > are. I'm afraid that you might have to do some debugging on your
> > own, since this issue doesn't seem easily reproducible.
> 
> What tactics should I employ?  All of those file handles to
> /run/systemd/private are owned by PID 1, and 'ss' implies there are
> no peers.
> 
> 'strace' in pid shows messages are flowing, but that doesn't reveal
> the logic about how the connections get created or culled, nor who
> initiated them.

strace -p1 -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999

yields the relevant info. In particular, the pid, uid, and guid of the
remote is shown. My approach would be to log this to some file, and
then see which fds remain, and then look up this fd in the log.
The recvmsg calls contain the serialized dbus calls, a bit messy but
understandable. E.g. 'systemctl show systemd-udevd' gives something
like this:

recvmsg(20, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\1\4\1\5\0\0\0\1\0\0\0\257\0\0\0\1\1o\08\0\0\0", 
iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
recvmsg(20, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/systemd_2dudevd_2eservice\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0GetAll\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0\0\0\0\0\0",
 ...

HTH,
Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-10 Thread Brian Reichert
On Wed, Jul 10, 2019 at 07:37:19AM +, Zbigniew J??drzejewski-Szmek wrote:

> It's a bug report as any other. Writing a meaningful reply takes time
> and effort. Lack of time is a much better explanation than ressentiments.

I wasn't expressing resentment; I apologize if it came off that way.

> Please always specify the systemd version in use. We're not all SLES
> users, and even if we were, I assume that there might be different
> package versions over time.

Quite reasonable:

  localhost:/var/tmp # cat /etc/os-release
  NAME="SLES"
  VERSION="12-SP3"
  VERSION_ID="12.3"
  PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
  ID="sles"
  ANSI_COLOR="0;32"
  CPE_NAME="cpe:/o:suse:sles:12:sp3"

  localhost:/var/tmp # rpm -q systemd
  systemd-228-142.1.x86_64

> > When we first spin up a new SLES12 host with our custom services,
> > the number of connections to /run/systemd/private numbers in the
> > mere hundreds. 

> That sounds wrong already. Please figure out what those connections
> are. I'm afraid that you might have to do some debugging on your
> own, since this issue doesn't seem easily reproducible.

What tactics should I employ?  All of those file handles to
/run/systemd/private are owned by PID 1, and 'ss' implies there are
no peers.

'strace' in pid shows messages are flowing, but that doesn't reveal
the logic about how the connections get created or culled, nor who
initiated them.

On a box with ~500 of these file handles, I can see that many of
them are hours or days old:

  localhost:/var/tmp # date
  Wed Jul 10 09:45:01 EDT 2019

  # new ones
  localhost:/var/tmp # lsof -nP /run/systemd/private | awk '/systemd/ {
  sub(/u/, "", $4); print $4}' | (  cd /proc/1/fd; xargs ls -t --full-time ) | 
head -5
  lrwx-- 1 root root 64 2019-07-10 09:45:05.211722809 -0400 561 -> 
socket:[1183838]
  lrwx-- 1 root root 64 2019-07-10 09:40:02.611726025 -0400 559 -> 
socket:[1173429]
  lrwx-- 1 root root 64 2019-07-10 09:40:02.611726025 -0400 560 -> 
socket:[1176265]
  lrwx-- 1 root root 64 2019-07-10 09:33:10.687730403 -0400 100 -> 
socket:[113992]
  lrwx-- 1 root root 64 2019-07-10 09:33:10.687730403 -0400 101 -> 
socket:[115163]
  xargs: ls: terminated by signal 13

  # old ones
  localhost:/var/tmp # lsof -nP /run/systemd/private | awk '/systemd/ {
  sub(/u/, "", $4); print $4}' | (  cd /proc/1/fd; xargs ls -t --full-time ) | 
tail -5
  lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 59 -> 
socket:[43097]
  lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 60 -> 
socket:[44029]
  lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 63 -> 
socket:[46234]
  lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 65 -> 
socket:[49252]
  lrwx-- 1 root root 64 2019-07-08 15:12:04.725350882 -0400 71 -> 
socket:[54064]
  
> > Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private
> > correct?
> 
> Yes. The number is hardcoded because it's expected to be "large
> enough". The connection count shouldn't be more than "a few" or maybe
> a dozen at any time.

Thanks for confirming that.

> Zbyszek

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-10 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jul 02, 2019 at 09:57:44AM -0400, Brian Reichert wrote:
> At $JOB, on some of our SLES12 boxes, our logs are getting swamped
> with messages saying:
> 
>   "Too many concurrent connections, refusing"

Please always specify the systemd version in use. We're not all SLES
users, and even if we were, I assume that there might be different
package versions over time.

>   # ss -x | grep /run/systemd/private | wc -l
>   4015

/run/systemd/private is used by systemctl and other systemd utilities
when running as root. Those connections are expected to be short-lived.
Generally, on a normal machine "ss -x | grep /run/systemd/private | wc -l"
is expected to yield 0 or a very low number transiently.

> But, despite the almost 4k connections, 'ss' shows that there are
> no connected peers:
> 
>   # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
>   0

Interesting. ss output is not documented at all from what I can see,
but indeed '* 0' seems to indicate that. It is possible that systemd
has a leak and is not closing the private bus connections properly.

> When we first spin up a new SLES12 host with our custom services,
> the number of connections to /run/systemd/private numbers in the
> mere hundreds. 
That sounds wrong already. Please figure out what those connections
are. I'm afraid that you might have to do some debugging on your
own, since this issue doesn't seem easily reproducible.

(I installed systemd with CONNECTIONS_MAX set to 10, and I can easily
saturate the number of available connections with
  for i in {1..11}; do systemctl status '*' & sleep 0.5; kill -STOP $!;done
As soon as I allow the processes to continue or kill them, the connection
count goes down. They never show up with '* 0'.)

> Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private
> correct?

Yes. The number is hardcoded because it's expected to be "large
enough". The connection count shouldn't be more than "a few" or maybe
a dozen at any time.

> I have a hypothesis that this may be some resource leak in systemd,
> but I've not found a way to test that.

Once you figure out what is creating the connection, it would be useful
to attach strace to pid 1 and see what is happening there.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-08 Thread Brian Reichert
On Tue, Jul 02, 2019 at 09:57:44AM -0400, Brian Reichert wrote:
> At $JOB, on some of our SLES12 boxes, our logs are getting swamped
> with messages saying:
> 
>   "Too many concurrent connections, refusing"
> 
> It's hampering our ability to manage services, e.g.:
> 
>   # systemctl status ntpd
>   Failed to get properties: Connection reset by peer

Can anyone at least confirm that the CONNECTIONS_MAX limit in dbus.c
does relate to the number of connections systemd has to
/run/systemd/private?

We can't even reliably reboot the server in question under these
circumstances...


-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-02 Thread Brian Reichert
At $JOB, on some of our SLES12 boxes, our logs are getting swamped
with messages saying:

  "Too many concurrent connections, refusing"

It's hampering our ability to manage services, e.g.:

  # systemctl status ntpd
  Failed to get properties: Connection reset by peer

Near as I can tell from a quick read of the source of dbus.c, we're
hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096).  I
think this is related to the number of connections systemd (pid 1)
has to /run/systemd/private, but I'm guessing here:

  # ss -x | grep /run/systemd/private | wc -l
  4015

But, despite the almost 4k connections, 'ss' shows that there are
no connected peers:

  # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
  0

The symptom here is that depending on system activity, systemd stops
being able to process new requests. systemd allows requests to come
in (e.g. via an invocation of 'systemctl', but if I understand the
source of dbus.c, when there are too many connections to it's
outgoing stream, systemd rejects the efforts, apparently with no
retry.

When we first spin up a new SLES12 host with our custom services,
the number of connections to /run/systemd/private numbers in the
mere hundreds.  As workloads increase, the number of connections
raises to the thousands.  Some hosts are plagued with the 'Too many
concurrent' connections, some are not. Empirically, all I've been
able to see is that the number of systemd's connections to
/run/systemd/private tips over 4k.

Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private
correct?

- I can't demonstrate that there are any consumers of this stream.
- I can't explain why the connection count increases over time.
- The CONNECTION_MAX constant is hard-coded, and it gets increased
  every few months/years, but never seems to be expressed as something
  you can set in a config file.
- I don't know what tunables affect the lifetime/culling of those
  connections.

I have a hypothesis that this may be some resource leak in systemd,
but I've not found a way to test that.

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel