On Sun, Mar 5, 2017 at 3:59 PM, Pascal Kolijn <p.kol...@vu.nl> wrote: > Peace, > > On 28/02/2017 16:00, Lennart Poettering wrote: >> On Tue, 28.02.17 13:26, Pascal kolijn (p.kol...@vu.nl) wrote: >> >>> Hi List, >>> >>> I've subscribed to this list to ask for help in debugging a problem we >>> seem to have with the socket activated telnetd on a rhel7 system. >>> >>> A default install of telnetd collects data from some small boxes >>> deployed in the field. It works for a long time and then suddenly: >>> >>> Feb 26 17:46:53 bibr systemd: Created slice user-6006.slice. >>> Feb 26 17:46:53 bibr systemd: Starting user-6006.slice. >>> Feb 26 17:46:53 bibr systemd: Started Session 2223341 of user <USER>. >>> Feb 26 17:46:53 bibr systemd-logind: New session 2223341 of user <USER>. >>> Feb 26 17:46:53 bibr systemd: Starting Session 2223341 of user <USER>. >>> Feb 26 17:46:53 bibr systemd: Started Telnet Server (<IP>:28830). >>> Feb 26 17:46:53 pbibr001 systemd: Starting Telnet Server (<IP>:28830)... >>> Feb 26 17:46:57 bibr systemd: Failed to fork: Cannot allocate memory >> >> Hmm, Linux fork() returns ENOMEM if the maximum number of tasks on the >> system is hit (yes this is a bit misleading, but that's how it is). >> That max number of tasks is limited for example by the max number of >> assignable pids as configured in /proc/sys/kernel/pid_max? Maybe you >> hit that limit? Maybe something is leaking pids on your system? not >> reaping zombies properly? > > As far as I can determine running out of pids is not the issue, as I can > see pids being reused in a day, which will not say that some may still > go missing over time, but how do I determine if that is the case...? > > What I do see is that the rss of the systemd process is slowly growing > over time in the production environment. I've not been able (yet) to > reproduce the situation in a test environment, which is a pity. I think > I can simulate the telnet connects more accurately after I speak with > the developer of the said boxes, and see if I can create a reproducible > situation. > >>> Feb 26 17:46:57 bibr systemd: Assertion 'pid >= 1' failed at >>> src/core/unit.c:1996, function unit_watch_pid(). Aborting. >>> Feb 26 17:46:57 bibr001 systemd: Caught <ABRT>, cannot fork for core >>> dump: Cannot allocate memory >>> Feb 26 17:46:57 bibr systemd: Freezing execution. >> >> So this is definitely a bug. If the limit is hit, we hould certainly >> not hit an assert. I tried to figure out how this could ever happen, >> but afaics this should not be possible on current git at least. Any >> chance you can try to reproduce this isue with something more recent >> than a rhel7 box? > > Hmmm, the version we currently use in production is: > > # rpm -qa | grep systemd > systemd-libs-219-19.el7_2.13.x86_64 > systemd-219-19.el7_2.13.x86_64 > systemd-sysv-219-19.el7_2.13.x86_64
I've backported bunch of fixes for memory leaks to systemd-219-19.el7_2.14. From changelog, * Mon Aug 22 2016 Lukas Nykryn <lnyk...@redhat.com> - 219-19.14 - core: fix memory leak on set-default, enable, disable etc (#1331667) - nspawn: fix minor memory leak (#1331667) - basic: fix error/memleak in socket-util (#1331667) - core: fix memory leak in manager_run_generators() (#1331667) - modules-load: fix memory leak (#1331667) - core: fix memory leak on failed preset-all (#1331667) - sd-bus: fix memory leak in test-bus-chat (#1331667) - core: fix memory leak in transient units (#1331667) Fix is in the code path that is hit everytime you log onto the box, because every session has its own scope unit. - bus: fix leak in error path (#1331667) - shared/logs-show: fix memleak in add_matches_for_unit (#1331667) > > I think I can update it to the current state in 7.3 for the production > machine, but will be reluctant to go for a more recent version... Those fixes are of course included in 7.3 as well. Michal > > Maybe in the test env, if I can reproduce it there. > >> Either way it appears that there's both a bug on your setup and in >> systemd: something leaks processes (which is bug #1, in your setup) >> and then systemd doesn't deal properly with that (which is bug #2, in >> systemd upstream)... >> >> Lennart >> > > Pascal Kolijn > Vrije Universiteit Amsterdam > _______________________________________________ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel