> 
> On Tue, 31 Oct 2006 09:48:26 -0500 James Carlson
> wrote:
> > Glenn Fowler writes:
> > > but I think quality of implementation should push
> > > posix_spawn() implementers to find a way to make
> posix_spawn()
> > > fail immediately on exec error -- with real vfork
> this is possible
> > > we've done it in the spwanveg() implementation
> that uses real vfork/exec
> 
> > So, then, let's see to it that posix_spawn is fixed
> in Solaris.

Mia Culpa.  Yes, I implemented posix_spawn() in Solaris 10,
and I used the posix-approved kludge of having the vfork() child
call _exit(127) when it encountered a failure before or at execve().
This was a grievous fault.

I filed a bug against Solaris 11 (aka nevada):

    6488832 posix_spawn() should return better error codes

The fix uses the trick of having the vfork() child put the failing errno value
into a local variable that is picked up by the parent after the child exits.

The more interesting part is how to dispose of the failed child.
An immediate call to waitid() in the posix_spawn() implementation
is the obvious answer, but this results in a possible race condition
in which some other thread in the process is sleeping in wait(),
waiting for any child to terminate, when posix_spawn() is called.
This other thread could receive the termination status of the failed
vfork() child before the thread executing in posix_spawn() calls
waitid() for the particular process-id.

It is necessary for the failed vfork() child to evaporate completely,
just as though the parent had set the disposition of SIGCHLD to be ignored.
This can be accomplished by having the failed child call _exit() with
a unique termination code, 0xffff0000.  This is an invalid code for
_exit() in general because only the lowest 8 bits are used to return
the termination status to the parent, and in this case that would be 0.
The combination of this unique exit code and the fact that the child is
still a child of vfork() that has not yet performed a successful exec()
is sufficient tell the kernel to make the child evaporate (by calling
freeproc() rather than sigcld() in proc_exit()).

I've tested the fix with the iffe-generated test and it now finds that
posix_spawn() exists and is worth using.

I'll have the fix put back into Solaris 11 (nevada) in build 53.
I'll also arrange to back-port the fix to some update of Solaris 10.

Of course, vfork() is a fine thing to use in a process with only a
single thread.  It is unsafe to use in a process with more than one
thread because it can result in parent/child deadlocks.
For this reason, multithreaded code should use posix_spawn()
rather than vfork().

FWIW: Only libc can be made to use vfork() safely in a multithreaded
environment, since only it can avoid the dynamic linking pitfalls.
Any code outside of libc must at least call vfork() ... execve()
and the call to execve() will invoke the dynamic linker and
possibly result in deadlock.

Roger Faulkner
 
 
This message posted from opensolaris.org

Reply via email to