Hi!

----

A few days ago Mike Kupfer found a crash while trying to build the
ksh93-integration prototype on an AMD64 machine. The "make install"
target in usr/src/cmd/ksh/ runs the ksh93/AST test suite which crashed
in one of the test runs.

The crash sequence looks like this:
-- snip --
$ cd usr/src/cmd/ksh/amd64
$ make install
[snip]
# which
ksh='/home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/amd64/ksh',
ksh93='/home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/amd64/ksh93'
## Running ksh test: LANG='C' script='alias.sh'
## Running ksh test: LANG='C' script='append.sh'
## Running ksh test: LANG='C' script='arith.sh'
ksh[18]: 24269 Segmentation Fault(coredump)
*** Error code 139
-- snip --

The stack trace looks like this:
-- snip --
$ dbx - core
Corefile specified executable:
"/home/test001/ksh93/on_build1/test1_x86/proto/root_i386/usr/bin/amd64/ksh93"
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.4' in
your .dbxrc
Reading ksh93
core file header read successfully
Reading ld.so.1
Reading libshell.so.1
Reading libc.so.1
Reading libcmd.so.1
Reading libdll.so.1
Reading libast.so.1
Reading libsecdb.so.1
Reading libm.so.2
Reading libsocket.so.1
Reading libnsl.so.1
program terminated by signal SEGV (no mapping at the fault address)
0xfffffd7fff357eae: expr+0x004e:        cmpl    
0x000000000003ef1f(%rbx),%r8d
(dbx) where
=>[1] expr(0xfffffd7fffdfe948, 0x1f, 0x28, 0xfffffd7fff37048e,
0xfffffd7fff357ed2, 0x7), at 0xfffffd7fff357eae 
  [2] arith_compile(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff359136 
  [3] sh_arithcomp(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3271cf 
  [4] getanode(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d236 
  [5] item(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34f1d5 
  [6] term(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34da7f 
  [7] list(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d9b4 
  [8] sh_cmd(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d888 
  [9] item(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34f12d 
  [10] term(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34da7f 
  [11] list(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d9b4 
  [12] sh_cmd(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d888 
  [13] sh_parse(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d530 
  [14] exfile(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3235ff 
  [15] sh_main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff322f22 
  [16] main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4009d5 
-- snip --

My compiler version is:
-- snip --
$ cc -V
cc: Sun C 5.7 Patch 117837-04 2005/05/11
usage: cc [ options] files.  Use 'cc -flags' for details
$ CC -V
CC: Sun C++ 5.7 Patch 117831-02 2005/03/30
$ uname -a
SunOS hal-9000 5.11 snv_43 i86pc i386 i86pc SunOS
-- snip --

I suspect this may be a compiler bug in Sun Studio 10 (CC:'ing
"Christopher D. Quenelle" <Chris.Quenelle at Sun.COM>) since this issue is
AMD64-specific, neither any of the 32bit binaries (i86, sparc) nor the
64bit sparcv9 binary show this problem and turning the optimisation off
cures the problem as described in the commit for the workaround (see
http://polaris.blastwave.org/changeset/391). Since then a refined
version of the workaround has been applied
(http://polaris.blastwave.org/changeset/400) which narrows-down the
problem to the responding source file.

* Steps to reproduce:
1. Pull sources and extract closed bin stuff:
$ svn checkout -r 400
svn://svn.genunix.org/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr
$ bzcat ../download/on-closed-bins-b37.i386.tar.bz2 | tar -xf -

2. Run "bldenv":
$ cd .. ; env - SHELL=$SHELL TERM=$TERM HOME=$HOME LOGNAME=$LOGNAME
DISPLAY=$DISPLAY LANG=C LC_ALL=C PAGER=less MANPATH=$MANPATH
/opt/onbld/bin/bldenv opensolaris.sh

3. Build it (the quick way):
$ cd test_x86/usr/src
$ time nice make setup 2>&1 | tee -a buildlog_setup.log
$ time nice dmake install >buildlog.log 2>&1

4. Backout workaround added with
http://polaris.blastwave.org/changeset/400 and
http://polaris.blastwave.org/changeset/391

5. Recompile libshell:
$ cd lib/libshell ; make install

6a. Run tests (normal way):
$ cd ../../cmd/ksh/amd64 ; make install

  OR

6b. Run tests manually (NOT RECOMMENDED):
% (LD_LIBRARY_PATH=$ROOT/lib/amd64 $ROOT/usr/bin/amd64/ksh93
$SRC/lib/libshell/common/tests/arith.sh )

I uploaded a sample core dump from a snv_43 build machine to
http://www.opensolaris.org/os/project/ksh93-integration/downloads/ksh93_integration_20060821_amd64_crash_in_arith_sh.core.bz2
(MD5(ksh93_integration_20060821_amd64_crash_in_arith_sh.core.bz2)=
10357a20d58ff17132de914e10896515) for further analysis by someone who
may know AMD64 assembler better than I do...

... help/suggestions/comments/rants/etc. welcome... :-)

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to