Just some thoughts on the subject here:

For the sake of discussion, I'll assume we are dealing with a compiler bug,
but we won't really know for sure until we see the code-generation
bug in front of us.  Until then it might possibly be a bug in the code
resulting in platform dependant code.

--------

I've never built Solaris before, and I'm not familiar with the
scripts and tools, but I can describe the steps that would be helpful
to isolate the bug.

Try building all parts of the call stack with -g and no -O
except for the file that causes the bug.  Compile that file
with -g -O and try to reproduce the problem.

2. If you can create a set of sources that can be compiled
away from the rest of the Solaris/ksh environment, that
will make it go much faster when a compiler person looks at
the problem.

For example, (assuming the bug is probably in expr): If you can
set up the data structures and parameters to 'expr' in a sample
program, and then supply the test harness program
and the CPP-processed source file containing expr, it will be
much easier to reproduce.

We now have an interface on bugs.sun.com that allows you to
submit bugs against the Sun Studio compilers.  In describing
the problem, it would be best if you assumed that anyone evaluating
the bug is not familiar with any part of the Solaris build process.

There is also a very recent pre-release version of the compiler
available as "Sun Studio Express 2".  You might try downloading
this newer compiler and using it only on the source file causing
trouble.  If that fixes the problem, then that information will be
very useful to have in the bug description.

--chris


Roland Mainz wrote:
> Hi!
> 
> ----
> 
> A few days ago Mike Kupfer found a crash while trying to build the
> ksh93-integration prototype on an AMD64 machine. The "make install"
> target in usr/src/cmd/ksh/ runs the ksh93/AST test suite which crashed
> in one of the test runs.
> 
> The crash sequence looks like this:
> -- snip --
> $ cd usr/src/cmd/ksh/amd64
> $ make install
> [snip]
> # which
> ksh='/home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/amd64/ksh',
> ksh93='/home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/amd64/ksh93'
> ## Running ksh test: LANG='C' script='alias.sh'
> ## Running ksh test: LANG='C' script='append.sh'
> ## Running ksh test: LANG='C' script='arith.sh'
> ksh[18]: 24269 Segmentation Fault(coredump)
> *** Error code 139
> -- snip --
> 
> The stack trace looks like this:
> -- snip --
> $ dbx - core
> Corefile specified executable:
> "/home/test001/ksh93/on_build1/test1_x86/proto/root_i386/usr/bin/amd64/ksh93"
> For information about new features see `help changes'
> To remove this message, put `dbxenv suppress_startup_message 7.4' in
> your .dbxrc
> Reading ksh93
> core file header read successfully
> Reading ld.so.1
> Reading libshell.so.1
> Reading libc.so.1
> Reading libcmd.so.1
> Reading libdll.so.1
> Reading libast.so.1
> Reading libsecdb.so.1
> Reading libm.so.2
> Reading libsocket.so.1
> Reading libnsl.so.1
> program terminated by signal SEGV (no mapping at the fault address)
> 0xfffffd7fff357eae: expr+0x004e:        cmpl    
> 0x000000000003ef1f(%rbx),%r8d
> (dbx) where
> =>[1] expr(0xfffffd7fffdfe948, 0x1f, 0x28, 0xfffffd7fff37048e,
> 0xfffffd7fff357ed2, 0x7), at 0xfffffd7fff357eae 
>   [2] arith_compile(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff359136 
>   [3] sh_arithcomp(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3271cf 
>   [4] getanode(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d236 
>   [5] item(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34f1d5 
>   [6] term(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34da7f 
>   [7] list(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d9b4 
>   [8] sh_cmd(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d888 
>   [9] item(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34f12d 
>   [10] term(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34da7f 
>   [11] list(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d9b4 
>   [12] sh_cmd(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d888 
>   [13] sh_parse(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff34d530 
>   [14] exfile(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3235ff 
>   [15] sh_main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff322f22 
>   [16] main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4009d5 
> -- snip --
> 
> My compiler version is:
> -- snip --
> $ cc -V
> cc: Sun C 5.7 Patch 117837-04 2005/05/11
> usage: cc [ options] files.  Use 'cc -flags' for details
> $ CC -V
> CC: Sun C++ 5.7 Patch 117831-02 2005/03/30
> $ uname -a
> SunOS hal-9000 5.11 snv_43 i86pc i386 i86pc SunOS
> -- snip --
> 
> I suspect this may be a compiler bug in Sun Studio 10 (CC:'ing
> "Christopher D. Quenelle" <Chris.Quenelle at Sun.COM>) since this issue is
> AMD64-specific, neither any of the 32bit binaries (i86, sparc) nor the
> 64bit sparcv9 binary show this problem and turning the optimisation off
> cures the problem as described in the commit for the workaround (see
> http://polaris.blastwave.org/changeset/391). Since then a refined
> version of the workaround has been applied
> (http://polaris.blastwave.org/changeset/400) which narrows-down the
> problem to the responding source file.
> 
> * Steps to reproduce:
> 1. Pull sources and extract closed bin stuff:
> $ svn checkout -r 400
> svn://svn.genunix.org/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr
> $ bzcat ../download/on-closed-bins-b37.i386.tar.bz2 | tar -xf -
> 
> 2. Run "bldenv":
> $ cd .. ; env - SHELL=$SHELL TERM=$TERM HOME=$HOME LOGNAME=$LOGNAME
> DISPLAY=$DISPLAY LANG=C LC_ALL=C PAGER=less MANPATH=$MANPATH
> /opt/onbld/bin/bldenv opensolaris.sh
> 
> 3. Build it (the quick way):
> $ cd test_x86/usr/src
> $ time nice make setup 2>&1 | tee -a buildlog_setup.log
> $ time nice dmake install >buildlog.log 2>&1
> 
> 4. Backout workaround added with
> http://polaris.blastwave.org/changeset/400 and
> http://polaris.blastwave.org/changeset/391
> 
> 5. Recompile libshell:
> $ cd lib/libshell ; make install
> 
> 6a. Run tests (normal way):
> $ cd ../../cmd/ksh/amd64 ; make install
> 
>   OR
> 
> 6b. Run tests manually (NOT RECOMMENDED):
> % (LD_LIBRARY_PATH=$ROOT/lib/amd64 $ROOT/usr/bin/amd64/ksh93
> $SRC/lib/libshell/common/tests/arith.sh )
> 
> I uploaded a sample core dump from a snv_43 build machine to
> http://www.opensolaris.org/os/project/ksh93-integration/downloads/ksh93_integration_20060821_amd64_crash_in_arith_sh.core.bz2
> (MD5(ksh93_integration_20060821_amd64_crash_in_arith_sh.core.bz2)=
> 10357a20d58ff17132de914e10896515) for further analysis by someone who
> may know AMD64 assembler better than I do...
> 
> ... help/suggestions/comments/rants/etc. welcome... :-)
> 
> ----
> 
> Bye,
> Roland
> 


Reply via email to