Let's step back again and try to check our goals...
We could try to optimize the one-reaper-thread-per-subprocess thing.
But that is risky, and the cost of what we're doing today is not
that high.
We could try to implement the feature of killing off an entire
subprocess tree. But historically, any kind of behavior change like
that has been vetoed. I have tried and failed to make less
incompatible changes. We would have to add a new API.
The reality is that Java does not give you real access to the
underlying OS, and unless there's a seriously heterodox attempt to
provide OS-specific extensions, people will have to continue to
either write native code or delegate to an OS-savvy subprocess like
a perl script.
On Fri, Apr 11, 2014 at 7:52 AM, Peter Levart
<peter.lev...@gmail.com <mailto:peter.lev...@gmail.com>> wrote:
On 04/09/2014 07:02 PM, Martin Buchholz wrote:
On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart
<peter.lev...@gmail.com <mailto:peter.lev...@gmail.com>> wrote:
Hi Martin,
As you might have seen in my later reply to Roger, there's
still hope on that front: setpgid() + wait(-pgid, ...)
might be the answer. I'm exploring in that direction.
Shells are doing it, so why can't JDK?
It's a little trickier for Process API, since I imagine
that shells form a group of processes from a pipeline which
is known in-advance while Process API will have to add
processes to the live group dynamically. So some races will
have to be resolved, but I think it's doable.
This is a clever idea, and it's arguably better to design
subprocesses so they live in separate process groups (emacs
does that), but:
Every time you create a process group, you change the effect of
a user signal like Ctrl-C, since it's sent to only one group.
Maybe propagate signals to the subprocess group? It's starting
to get complicated...
Hi Martin,
Yes, shells send Ctrl-C (SIGINT) and other signals initiated by
terminal to a (foreground) process group. A process group is
formed from a pipeline of interconnected processes. Each
pipeline is considered to be a separate "job", hence shells call
this feature "job-control". Child processes by default inherit
process group from it's parent, so children born with Process
API (and their children) inherit the process group from the JVM
process. Considering the intentions of shell job-controll, is
propagating SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children
spawned by Process API desirable? If so, then yes, handling
those signals in JVM and propagating them to current process
group that contains all children spawned by Process API and
their descendants would have to be performed by JVM. That
problem would certainly have to be addressed. But let's first
see what I found out about sigaction(SIGCHLD, ...), setpgid(pid,
pgid), waitpid(-pgid, ...), etc...
waitpid(-pgid, ...) alone seems to not be enough for our task.
Mainly because a process can re-assign it's group and join some
other group. I don't know if this is a situation that occurs in
real world, but imagine if we have one live child process in a
process group pgid1 and no unwaited exited children. If we issue:
waitpid(-pgid1, &status, 0);
Then this call blocks, because at the time it was given, there
were >0 child processes in the pgid1 group and none of them has
exited yet. Now if this one child process changes it's process
group with:
setpgid(0, pgid2);
Then the waitpid call in the parent does not return (maybe this
is a bug in Linux?) although there are no more live child
processes in the pgid1 group any more. Even when this child
exits, the call to waitpid does not return, since this child is
not in the group we are waiting for when it exits. If all our
children "escape" the group in such way, the tread doing waiting
will never unblock. To solve this, we can employ signal
handlers. In a signal handler for SIGCHLD signal we can invoke:
waitpid(-pgid1, &status, WNOHANG); // non-blocking call
...in loop until it either returns (0) which means that there're
no more unwaited exited children in the group at the momen or
(-1) with errno == ECHILD, which means that there're no more
children in the queried group any more - the group does not
exist any more. Since signal handler is invoked whith SIGCHLD
being masked and there is one bit of pending signal state in the
kernel, no child exit can be "skipped" this way. Unless the
child "escapes" by changing it's group. I don't know of a
plausible reason for a program to change it's process group. If
a program executing as JVM child wants to become a background
daemon it usually behaves as follows:
- fork()s a grand-child and then exit()s (so we get notified via
signal and waitpid(-pgid, ...) successfully for it's exitstatus)
- the grand-child then changes it's session and group (becomes
session and group leader), closes file descriptors, etc. The
responsibility for waiting on the grand-child daemon is
transferred to the init process (pid=1) since the grand-child
becomes an orphan (has no parent).
Ignoring this still unsolved problem of possible ill-behaved
child program that changes it's process group, I started
constructing a proof-of-concept prototype. What I will do in the
prototype is start throwing IllegalStateException from the
methods of the Process API that pertain to such children. I
think this is reasonable.
Stay tuned,
Peter