Hi Reindl,

With all due respect, it would be very helpful if you can be a little bit
less snarky.

i don't get the bash-nonsense for a handful of lines (most of them doing
nothing at all) to begin with and given that there is no "Type=" in the
unit file you may read the docs and try the different types
>> As per the man page (
https://man7.org/linux/man-pages/man5/systemd.service.5.html), the default
Type is simple if ExecStart is specified.

i also don't get the trial-binary
why in the world don't you trhow away all that crap inlcuding the docker
container and start dhclient at your own from a trivial systemd-unit?
>> As the name indicates it is a trial or minimalistic reproduction of the
issue we are seeing. Actual issue: We have a binary, which starts, and stop
dhclient on the interface on demand (Please don't come back complaining why
you would need to start and stop dhclient on demand). In case the binary
crashes for some unforeseen reason (which I had also mentioned in my
initial mail. Requoting here:  In the real-world, this can be a SIGSEGV
indicating a crash in the parent process.), we are seeing that the service
is stuck in a limbo deactivating state forever. This is seen only in Ubuntu
16.04 and all Centos 7.x version and is not seen in Ubuntu 20.04.

why in the world don't you trhow away all that crap inlcuding the docker
container and start dhclient at your own from a trivial systemd-unit?
>> Again, in a real-world scenario we support upto 1000+ vLANs. Running
1000 different services for each of the dhclient could be too costly was
our initial assessment and thus we ran "dhclient <interface> -nw" from the
parent process. If you feel systemd can handle such higher loads, without
causing a high perf impact, it would be helpful as well.

Regards,
Aravindhan Krishnan...


On Tue, 8 Jun 2021 at 18:35, Reindl Harald <h.rei...@thelounge.net> wrote:

>
>
> Am 08.06.21 um 14:50 schrieb Aravindhan Krishnan:
> > Hi Reindl,
> >
> > I have attached a minimalistic repro along with the codes of all the
> > scripts, service files. I suppose Silvio was able to see the files.
>
> i don't get the bash-nonsense for a handful of lines (most of them doing
> nothing at all) to begin with and given that there is no "Type=" in the
> unit file you may read the docs and try the different types
>
> i also don't get the trial-binary
>
> why in the world don't you trhow away all that crap inlcuding the docker
> container and start dhclient at your own from a trivial systemd-unit?
>
> it's impressive how many layers and helpers one can wrap around simple
> tasks but to gain what except troubles?
>
> keep it simple!
>
> > On Mon, 7 Jun 2021 at 21:53, Reindl Harald <h.rei...@thelounge.net
> > <mailto:h.rei...@thelounge.net>> wrote:
> >
> >
> >
> >     Am 07.06.21 um 17:57 schrieb Aravindhan Krishnan:
> >      > Adding Raghav.
> >      >
> >      > And sorry the subject should have stated: Discrepancy in using
> >     dhclient
> >      > b/w ubuntu 20.04 and ubuntu 16.04
> >
> >     and why didn't you fix it in your own reply?
> >
> >     to your problem:
> >     you have a wild mix of docker, systemd-units and shellscripts but
> don't
> >     provide the source of the scripts nor the systemd unit
> >
> >     overly complex for something that can be trivial as:
> >
> >     [root@srv-rhsoft:~]$ cat
> /etc/systemd/system/network-wan-dhcp.service
> >     [Unit]
> >     Description=Internet DHCP-Client
> >
> >     [Service]
> >     Type=forking
> >     ExecStart=/usr/sbin/dhclient -4 -q --no-pid --request-options
> >     subnet-mask,broadcast-address,routers br-wan
> >     PermissionsStartOnly=yes
> >     SuccessExitStatus=80
> >     Restart=always
> >     RestartSec=5
> >     ProtectSystem=strict
> >     ProtectHome=yes
> >     ReadWritePaths=-/var/lib/dhclient
> >     PrivateTmp=yes
> >     NoNewPrivileges=yes
> >     ProtectKernelTunables=yes
> >     ProtectKernelModules=yes
> >     ProtectControlGroups=yes
> >     MemoryDenyWriteExecute=yes
> >     CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE
> >     CAP_NET_BROADCAST CAP_NET_RAW
> >     LockPersonality=yes
> >     PrivateDevices=yes
> >     ProtectHostname=yes
> >     RestrictNamespaces=yes
> >     RestrictRealtime=yes
> >     RestrictSUIDSGID=yes
> >     ProtectClock=true
> >     ProtectKernelLogs=true
> >     UMask=077
> >     SystemCallArchitectures=native
> >     SystemCallFilter=@system-service @network-io @privileged
> >     SystemCallFilter=~@aio @chown @clock @cpu-emulation @debug @keyring
> >     @module @mount @obsolete @raw-io @reboot @resources @swap
> >     InaccessiblePaths=-/boot
> >     InaccessiblePaths=-/efi
> >     InaccessiblePaths=-/root
> >
> >      > On Mon, 7 Jun 2021 at 21:26, Aravindhan Krishnan
> >      > <aravindhan...@gmail.com <mailto:aravindhan...@gmail.com>
> >     <mailto:aravindhan...@gmail.com <mailto:aravindhan...@gmail.com>>>
> >     wrote:
> >      >
> >      >     Hi Folks,
> >      >
> >      >     I am finding anomalous behavior when I am trying to run
> dhclient
> >      >     process inside my docker container in vanilla Ubuntu 16.04
> >     host. The
> >      >     service gets into "deactivating" state and is stuck forever.
> >     In the
> >      >     mail I have attached a minimalistic reproduction of the issue
> >     seen.
> >      >
> >      >     Working logic:
> >      >
> >      >       * There is a sample trial@.service script which invokes the
> >      >         `trial` binary with the option passed to the systemd
> >     service via
> >      >         @ option
> >      >       * The valid options are sleep and dhclient_<interface_name>
> >      >       * The binary either invokes a long-lived sleep process or
> >     dhclient
> >      >         process on the said interface_name based on the input
> >      >       * The binary then spawns `kill_trial.sh` script. The script
> >     sleeps
> >      >         for 20 seconds and kills the parent `trial` binary. The
> kill
> >      >         signal is SIGKILL in the trial example. In the
> >     real-world, this
> >      >         can be a SIGSEGV indicating a crash in the parent process.
> >      >       * If the trial binary was started for sleep process things
> work
> >      >         fine and service goes into "failed" state as expected
> >      >       * However, in case of dhclient, the service is stuck in
> >      >         "deactivating" state if the underlying host OS is Ubuntu
> >     16.04.
> >      >         This works well if the host is running Ubuntu 20.04.
> >      >       * We have kept TimeoutStopSec to infinity, because in
> real-word
> >      >         deployments, the core collection post a crash takes
> >     varying time
> >      >         depending on the memory config on the host.
> >      >
> >      >
> >      >     Steps to reproduce
> >      >     # tar -xf minimal_repro.tar -C minimal_repro/
> >      >     # cd minimal_repro/
> >      >     # docker build -t trial .
> >      >     # docker rm -f trial
> >      >     # docker run -it -d --net=host --privileged -v
> >      >     /sys/fs/cgroup:/sys/fs/cgroup:ro --name trial trial
> >      >     # docker exec -it trial bash
> >      >
> >      >     # systemctl start trial@dhclient_eth1.service
> >      >
> >      >     # #Keep monitoring trial@dhclient_eth1.service -- issue
> should be
> >      >     seen within 20-30 seconds on Ubuntu 16.04 host
> >      >
> >      >     # systemctl status trial@dhclient_eth1.service
> >      >     ● trial@dhclient_eth1.service - Trial
> >      >           Loaded: loaded (/etc/systemd/system/trial@.service;
> static;
> >      >     vendor preset: enabled)
> >      >           Active: deactivating (stop-sigterm) (Result: signal)
> >     since Mon
> >      >     2021-06-07 13:19:12 UTC; 1min 11s ago
> >      >          Process: 55 ExecStartPre=/bin/bash
> >      >     /etc/systemd/system/trial_service_script.sh pre_start
> >     dhclient_eth1
> >      >     (code=exited, status=0/SUCCESS)
> >      >          Process: 56 ExecStart=/bin/bash
> >      >     /etc/systemd/system/trial_service_script.sh start
> dhclient_eth1
> >      >     (code=killed, signal=KILL)
> >      >         Main PID: 56 (code=killed, signal=KILL)
> >      >            Tasks: 0 (limit: 38590)
> >      >           Memory: 588.0K
> >      >           CGroup:
> >      >
> >
>  
> /docker/903fca0cee1387b7c2113a36ee5efdb3a25edd1e60584fe5da5d0c5b5ffd8241/system.slice/system-trial.slice/trial@dhclient_eth1.service
> >      >
> >      >     # #NOTE: `Active: deactivating` -- in stuck state
> >      >     # #Running `systemctl daemon-reload` forces the service to go
> to
> >      >     failed state
> >      >
> >      >     # systemctl start trial@sleep.service
> >      >
> >      >     # #Keep monitoring trial@sleep.service -- would be killed in
> >     20-30
> >      >     seconds and goes into failed state as expected
> >      >
> >      >     # # systemctl status trial@sleep.service
> >      >     ● trial@sleep.service - Trial
> >      >           Loaded: loaded (/etc/systemd/system/trial@.service;
> static;
> >      >     vendor preset: enabled)
> >      >           Active: failed (Result: signal) since Mon 2021-06-07
> >     13:38:19
> >      >     UTC; 21s ago
> >      >          Process: 113 ExecStartPre=/bin/bash
> >      >     /etc/systemd/system/trial_service_script.sh pre_start sleep
> >      >     (code=exited, status=0/SUCCESS)
> >      >          Process: 114 ExecStart=/bin/bash
> >      >     /etc/systemd/system/trial_service_script.sh start sleep
> >      >     (code=killed, signal=KILL)
> >      >          Process: 129 ExecStopPost=/bin/bash
> >      >     /etc/systemd/system/trial_service_script.sh post_stop sleep
> >      >     (code=exited, status=0/SUCCESS)
> >      >         Main PID: 114 (code=killed, signal=KILL)
> >      >
> >      >     Please advise on what can help us in alleviating the issue.
>
>
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to