Thank you for the proposal Adam. In this case, the proposal has to be sent upstream to Erlang, as Elixir simply delegates the Port functionality to the Erlang VM.
*José Valimhttps://dashbit.co/ <https://dashbit.co/>* On Sun, Feb 16, 2025 at 5:46 AM Adam Wight <adam.m.wi...@gmail.com> wrote: > While writing a library to integrate with an indivisibly long-running, > external program (rsync), I came across the problem described in > https://hexdocs.pm/elixir/Port.html#module-zombie-operating-system-processes > and I think there may be some fundamental mistakes in the advice given > there. > > Our analysis in the Port documentation says that a polite application will > detect when its stdio communication pipes are closed and will then > terminate itself. The fact that this is the case seems to be accidental, > and is based on an empirical observation that most applications do some > sort of I/O, so when one of the standard file descriptors closes the > application will encounter a read or write error and will stop. However, > there are plenty of applications which can and should continue beyond this > condition, and there's even a utility `nohup(1)` for exactly the purpose of > allowing applications to ignore problems with stdio, for example when > they're launched and backgrounded from an interactive terminal that will be > closed. > > An example of a utility which does no I/O and therefore ignores stdio file > descriptor statuses by default is `sleep(3)`, and I don't think it would be > correct to make it stop because stdio is closed. Running under elixir > provides a good demonstration of the problem we're looking at here: > > elixir -e 'System.cmd(System.find_executable("sleep"), ["60"])' > > Start that command and then kill the BEAM, and look for the sleep > process. It should still be running, in process state "Ss". > > This will also demonstrate a second problem with the Port documentation, > that the condition we're dealing with is an "orphan process" which is still > running but is now unassociated with a BEAM parent and can no longer be > controlled or communicated with by Elixir. Orphans are a bigger issue than > "zombie processes", which have already terminated and will show up in `ps` > output in state "Z", because an orphan can still cause side-effects and > consume resources. > > Some helpful Internet posts led me to what I believe is the correct way to > prevent an orphan child process, by calling it through an intermediate > application similar to the one suggested by Port docs but using `prctl(2)` > instead, which allows the intermediate to monitor the parent process (the > BEAM) and kill its child if the parent is terminated. The code below still > has a small race condition on launch, but I'll share it anyway: > > ```c > #define _XOPEN_SOURCE 700 > #include <signal.h> > #include <stddef.h> > #include <stdlib.h> > #include <sys/prctl.h> > #include <sys/wait.h> > #include <unistd.h> > > pid_t child_pid; > > void handle_signal(int signum) { > if (signum == SIGHUP && child_pid > 0) { > kill(child_pid, SIGKILL); > } > } > > int main(int argc, char* argv[]) { > // Send this process a HUP if the parent BEAM VM dies. > // FIXME: race condition until this line, if the parent is already dead. > prctl(PR_SET_PDEATHSIG, SIGHUP); > > // Listen for HUP and respond by killing the child process. > struct sigaction action; > action.sa_handler = handle_signal; > action.sa_flags = 0; > sigemptyset(&action.sa_mask); > sigaction(SIGHUP, &action, NULL); > > child_pid = fork(); > if (child_pid == 0) { > const char* command = argv[1]; > for (int i = 0; i < argc; i++) { > argv[i] = argv[i + 1]; > } > execv(command, argv); > } else { > waitpid(child_pid, NULL, 0); > } > > return 0; > } > ``` > > To try it out, save as main.c and compile like so: > > cc -g -O3 -std=c99 -pedantic -o parent-monitor main.c > > This tool shifts its argv by one to construct the child process, so from > the command line it would be called like `parent-monitor /usr/bin/sleep 60` > and to exercise a BEAM crash you can call it like so (after adjusting the > paths for your system): > > elixir -e 'System.cmd("./parent-monitor", ["/usr/bin/sleep", "60"])' > > Now you can see the sleep process is killed as soon as the VM is stopped. > > Although it's a fringe issue since the BEAM is normally stopped only > during development or when deploying new code, I feel like it could be > useful to bundle the behavior into Elixir or Erlang itself. This could be > seen as an elegant extension of the OTP supervisor tree principle beyond > the VM boundary, it seems to have some real-world consequences, and it's an > obscure problem for an application developer to solve from scratch each > time. > > There's one more use case to mention, that such a behavior should probably > be made optional, maybe as a flag to Port. I would imagine the default > should be to use the wrapper, so the implicit option might look like > `allow_orphan: false`. The rare case where we omit the wrapper would be > when it's more useful to let the child continue than to maintain control > over it. > > Kind regards, > Adam Wight > > -- > You received this message because you are subscribed to the Google Groups > "elixir-lang-core" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elixir-lang-core+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/elixir-lang-core/CAF56aJK-R_gyTSLBmYE%3DsWMOHMgZyujAJZBO6sHO4x2tekC41w%40mail.gmail.com > <https://groups.google.com/d/msgid/elixir-lang-core/CAF56aJK-R_gyTSLBmYE%3DsWMOHMgZyujAJZBO6sHO4x2tekC41w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2B7LtBG5Og3xYmFXZYtikzJrYfYSwO8CBBgRQixPeovXA%40mail.gmail.com.