On Fri, 28 Sep 2012 10:21:49 +0200 Cedric Blancher wrote: > On 28 September 2012 07:44, Glenn Fowler <g...@research.att.com> wrote: > > > > { INIT ast-ksh } 2012-09-27 alphas posted to > > www.research.att.com/sw/download/alpha/
> We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2 > Linux and latest Fedora: > test arith begins at 2012-09-28+08:51:50 > arith.sh[420]: compound var arithmetic failed > arith.sh[421]: compound var arithmetic failed > arith.sh[422]: compound var arithmetic failed > arith.sh[423]: compound var arithmetic failed > arith.sh[424]: compound var arithmetic failed > arith.sh[425]: compound var arithmetic failed > arith.sh[426]: compound var arithmetic failed > test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 error > ] > test attributes begins at 2012-09-28+09:19:32 > attributes.sh[128]: attributes not cleared for script execution > attributes.sh[133]: typeset -L should not be inherited > test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110 > tests 1 error ] > test attributes(shcomp) begins at 2012-09-28+09:19:34 > shcomp-attributes.ksh[128]: attributes not cleared for script > execution > shcomp-attributes.ksh[133]: typeset -L should not be inherited > test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2 > [ 110 tests 2 errors ] > test basic begins at 2012-09-28+09:19:36 > basic.sh[165]: script not working > basic.sh[171]: output file pointer not shared correctly > basic.sh[198]: builtin replaces standard input pipe > basic.sh[204]: $0 not correct for . script > basic.sh[211]: nested scripts failed > basic.sh[215]: scripts in subshells fail > basic.sh[350]: piping into script fails > basic.sh[359]: script pipe to shell fails > blabla > We've traced this down to the nonconforming glibc/Linux implementation > of posix_spawn() - disabling it cures the problem on Linux. I > crosschecked with the AIX build - it uses posix_spawn() the same way > but without triggering any failures. > I think this is a follow-up to > http://marc.info/?l=ast-developers&m=134785274012526&w=2 - I can't > agree with the assertion of Redhat's Michal Hlavinka that glibc > posix_spawn() is right, because the current behaviour is IMO useless > for use in a shell (hence the failures in the testsuite), and think a > fix in glibc is still required. to recap: grep _lib_posix_spawn arch/*/src/lib/libast/FEATURE/lib there are 3 possible results (1) not there => posix_spawn() unusable (2) #define _lib_posix_spawn 2 => works with no workarounds (3) #define _lib_posix_spawn 1 => works but posix_spawn() on an executable file that would fail with ENOEXEC via execve() creates a process that exits with status 127 our sol10.* systems have _lib_posix_spawn 1 and they work so something else is going on (we don't have a linux system with the new glibc posix_spawn()) it may be a timing problem with this logic in src/lib/libast/misc/spawnvex.c (spawnvex() is new and the api has not settled yet) #if _lib_posix_spawn < 2 if (waitpid(pid, &err, WNOHANG|WNOWAIT) == pid && EXIT_STATUS(err) == 127) { while (waitpid(pid, NiL, 0) == -1 && errno == EINTR); if (!access(path, X_OK)) errno = ENOEXEC; pid = -1; } #endif can you do an strace and see what the waitpid() is returning? my guess is on solaris the child process has exited 127 on ENOEXEC before the waitpid(pid, &err, WNOHANG|WNOWAIT) and on linux the process has not yet exited (but looking at build log over the last week I see some spurious exit code 127 failures on solaris, so it looks like a timing problem even for solaris) the standard allows exit code 127 for fork()/exec() in the case of ENOEXEC producing a child process that will eventually exit 127 I'm beginning to fear that there is no way to work around the timing window -- a sleep() before waitpid() would be dumb and not guaranteed to work anyway -- the posix_spawn() wrapper could check the magic number but I don't want to get into the magic number game that's exec*()'s job so if it is a timing window, the iffe test will have to fail posix_spawn() implementations that create a child process for ENOEXEC and if that's the case it shows how usesless posix_spawn() is because the caller only knows exit status 127, not the root of the problem in the case of the shell calling posix_spawn() it must know the reason for failure ENOEXEC means the shell can attempt to treat the executable as a script not so for exit code 127 I just noticed that this code is not strictly portable because it relies on the non-standard linux WNOWAIT without iffe-ing or #ifdef-ing it for now its ok (by luck) because _lib_posix_spawn=1 only on { linux solaris } I'll modify the iffe test to only emit _lib_posix_spawn=1 if WNOWAIT is defined otherwise posix_spawn() is useless because the spwanvex() wrapper must not interfere with the caller's ability to wait on the spawned process (but if we never get _lib_posix_spawn=1 this observation is moot) _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers