I could really use some help from the gcc experts.

Please create a bugzilla report, or other well-known tracking instance.
In particular, bugzilla asks about repeatability, version numbers, etc.
Non-repeatability due to unspecified or mismatched versions is frustrating.

A package I maintain, cyrus-imapd, contains two extensive test suites
which we run at package build time.  After the big flag day where we
updated gcc and glibc and such in rawhide, one of the test suites now
shows failures and produces 22 core dumps, but only when run in mock
(not even fedpkg local on a rawhide container).  Even in mock, if I
get into the chroot, duplicate the test environment and run the failing
program by hand (or under strace, or under gdb) then it doesn't

What does running under memcheck ("valgrind --track-origins=yes ...") say?
The reported behavior is consistent with use of an uninitialized value.
[gdb changes the environment by adding two pipes when invoking a process.]

After getting cores and all of the debugging stuff into mock
(instructions below) I found that all cores have substantially identical

(gdb) bt
#0  0x0000000000000120 in ?? ()
#1  0x00007f18a19d281e in _Unwind_ForcedUnwind_Phase2 (exc=0x7fffbc364c70, 
context=0x7fffbc364990, frames_p=0x7fffbc364898) at 
#2  0x00007f18a19d3105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243
#3  0x00007f18a7dbbb90 in stem_version_set (version=<optimized out>, 
database=<optimized out>) at /usr/include/c++/8/bits/char_traits.h:320
#4  xapian_dbw_open (paths=0x55aff951eb70, dbwp=0x55aff951f0f8) at 

Looking at the code:
===== gcc/libgcc/unwind.inc
 _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc,
                              struct _Unwind_Context *context,
                              unsigned long *frames_p)
   _Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1;
 <<skip to line 170:>>
       stop_code = (*stop) (1, action, exc->exception_class, exc,
                            context, stop_argument);
we see that function pointer 'stop' is cast from an untyped word 'private_1'
with no checking at all, not even for NULL or < PAGE_SIZE, etc.
This is a giant red flag for unreliable code.
Such a check would have avoided the particular SIGSEGV in the traceback above.
Of course this might cause vague or incorrect results, but there could
be strong hints about what to fix, instead of just a bare SIGSEGV.
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org

Reply via email to