#3295: Null deref by threaded runtime scheduler
-----------------------------------------------------+----------------------
Reporter:  A1kmm                                     |          Owner:          
      
    Type:  bug                                       |         Status:  new     
      
Priority:  normal                                    |      Component:  Runtime 
System
 Version:  6.11                                      |       Severity:  major   
      
Keywords:  crash, nullderef, threaded, parallel, GC  |       Testcase:          
      
      Os:  Linux                                     |   Architecture:  ia64    
      
-----------------------------------------------------+----------------------
 Using ghc and runtime built from HEAD on Tuesday (although this has been
 an issue on older builds I tried as well), the ghc runtime crashes on a
 null deref.

 The system is a 24-processor Intel Xeon 2.66 GHz shared memory system
 running Linux 2.6.16.60-0.34-smp in 64 bit mode (from SUSE 10 SP2).

 This only happens when the program is compiled with -threaded (threaded
 runtime), but doesn't happen with the threaded debug runtime.

 It also only happens when +RTS -N2 or a greater number of threads is
 passed to the runtime. It doesn't happen when -g1 is passed to turn off
 distributed garbage collection.

 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 1241594176 (LWP 20751)]
 0x000000000067262c in schedule (initialCapability=<value optimized out>,
 task=0x994370) at rts/Schedule.c:672
 672         return (waiting_for_gc ||
 (gdb) list
 667         //   - another thread is initiating a GC
 668         //   - another Task is returning from a foreign call
 669         //   - the thread at the head of the run queue cannot be run
 670         //     by this Task (it is bound to another Task, or it is
 unbound
 671         //     and this task it bound).
 672         return (waiting_for_gc ||
 673                 cap->returning_tasks_hd != NULL ||
 674                 (!emptyRunQueue(cap) && (task->tso == NULL
 675                                          ? cap->run_queue_hd->bound !=
 NULL
 676                                          : cap->run_queue_hd->bound !=
 task)));
 (gdb) print cap
 $1 = (Capability *) 0x0
 (gdb) print *task
 $2 = {id = 1241594176, cap = 0x0, stopped = 7160552, suspended_tso = 0x0,
 tso = 0x7fccc4, stat = NoStatus, ret = 0x0, cond = {__data = {__lock = 1,
 __futex = 0,
       __total_seq = 1, __wakeup_seq = 0, __woken_seq = 0, __mutex =
 0x84e340, __nwaiters = 10033872, __broadcast_seq = 0},
     __size = "\001\000\000\000\000\000\000\000\001", '\0' <repeats 23
 times>, "@�\204\000\000\000\000\000�\032\231\000\000\000\000", __align =
 1}, lock = {__data = {
       __lock = 0, __count = 1, __owner = 0, __nusers = 0, __kind = 22668,
 __spins = 0, __list = {__prev = 0x5793, __next = 0x85afd}},
     __size = "\000\000\000\000\001", '\0' <repeats 11 times>,
 
"\214X\000\000\000\000\000\000\223W\000\000\000\000\000\000�Z\b\000\000\000\000",
 __align = 4294967296},
   wakeup = 547420, elapsedtimestart = 202, muttimestart = 0, mut_time = 0,
 mut_etime = 0, gc_time = 0, gc_etime = 10033296, prev = 0x994370, next =
 0x0, return_link = 0x0,
   all_link = 0x0, prev_stack = 0x9945c0}
 (gdb) info threads
 * 21 Thread 1241594176 (LWP 20751)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x994370) at
 rts/Schedule.c:672
   20 Thread 1233201472 (LWP 20750)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9b4930) at
 rts/Schedule.c:672
   19 Thread 1224808768 (LWP 20749)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9b2df0) at
 rts/Schedule.c:672
   18 Thread 1216416064 (LWP 20748)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9b12b0) at
 rts/Schedule.c:672
   17 Thread 1208023360 (LWP 20747)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9af770) at
 rts/Schedule.c:672
   16 Thread 1199630656 (LWP 20746)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9adc30) at
 rts/Schedule.c:672
   15 Thread 1191237952 (LWP 20745)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9ac0f0) at
 rts/Schedule.c:672
   14 Thread 1182845248 (LWP 20744)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9aa5b0) at
 rts/Schedule.c:672
   13 Thread 1174452544 (LWP 20743)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a8a70) at
 rts/Schedule.c:672
   12 Thread 1166059840 (LWP 20742)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a6f30) at
 rts/Schedule.c:672
   11 Thread 1157667136 (LWP 20741)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a53f0) at
 rts/Schedule.c:672
   10 Thread 1149274432 (LWP 20740)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a38b0) at
 rts/Schedule.c:672
   9 Thread 1140881728 (LWP 20739)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a1d70) at
 rts/Schedule.c:672
   8 Thread 1132489024 (LWP 20738)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a0230) at
 rts/Schedule.c:672
   7 Thread 1124096320 (LWP 20737)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x99e6f0) at
 rts/Schedule.c:672
   6 Thread 1115703616 (LWP 20736)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x99cbb0) at
 rts/Schedule.c:672
   5 Thread 1107310912 (LWP 20735)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x99b070) at
 rts/Schedule.c:672
   4 Thread 1098918208 (LWP 20734)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x999530) at
 rts/Schedule.c:672
   3 Thread 1090525504 (LWP 20733)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9979f0) at
 rts/Schedule.c:672
   2 Thread 1082132800 (LWP 20732)  0x00002aae2fa22548 in
 __lll_mutex_lock_wait () from /lib64/libpthread.so.0
   1 Thread 46927615298704 (LWP 20729)  0x00002aae2fa20553 in
 pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

 The values of *task and the thread in which the crash occurs is not always
 consistent, but the stacktrace is. For example, another crash...

 (gdb) bt
 #0  0x000000000067262c in schedule (initialCapability=<value optimized
 out>, task=0x9b4930) at rts/Schedule.c:672
 #1  0x0000000000673205 in workerStart (task=0x9915e0) at
 rts/Schedule.c:2033
 #2  0x00002aae0cd1a143 in start_thread () from /lib64/libpthread.so.0
 #3  0x00002aae0ceee8cd in clone () from /lib64/libc.so.6
 #4  0x0000000000000000 in ?? ()
 (gdb) print *task
 $6 = {id = 1233201472, cap = 0x0, stopped = 11791697, suspended_tso = 0x0,
 tso = 0x25ed0f1, stat = NoStatus, ret = 0x0, cond = {__data = {__lock = 1,
 __futex = 0,
       __total_seq = 1, __wakeup_seq = 0, __woken_seq = 0, __mutex =
 0x84e800, __nwaiters = 10033872, __broadcast_seq = 0},
     __size = "\001\000\000\000\000\000\000\000\001", '\0' <repeats 24
 times>, "�\204\000\000\000\000\000�\032\231\000\000\000\000", __align =
 1}, lock = {__data = {__lock = 0,
       __count = 1, __owner = 0, __nusers = 0, __kind = 875, __spins = 0,
 __list = {__prev = 0x33a, __next = 0x2da2b}},
     __size = "\000\000\000\000\001", '\0' <repeats 11 times>,
 
"k\003\000\000\000\000\000\000:\003\000\000\000\000\000\000+�\002\000\000\000\000",
 __align = 4294967296},
   wakeup = 186902, elapsedtimestart = 24, muttimestart = 0, mut_time = 0,
 mut_etime = 0, gc_time = 0, gc_etime = 10033296, prev = 0x9b4930, next =
 0x0, return_link = 0x0,
   all_link = 0x0, prev_stack = 0x9b4b80}
 (gdb) info threads
   22 Thread 1249986880 (LWP 20846)  0x00002aae0cd20548 in
 __lll_mutex_lock_wait () from /lib64/libpthread.so.0
   21 Thread 1241594176 (LWP 20845)  0x00002aae0cd1e1c6 in
 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
 * 20 Thread 1233201472 (LWP 20844)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9b4930) at
 rts/Schedule.c:672
   19 Thread 1224808768 (LWP 20843)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9b2df0) at
 rts/Schedule.c:672
   18 Thread 1216416064 (LWP 20842)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9b12b0) at
 rts/Schedule.c:672
   17 Thread 1208023360 (LWP 20841)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9af770) at
 rts/Schedule.c:672
   16 Thread 1199630656 (LWP 20840)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9adc30) at
 rts/Schedule.c:672
   15 Thread 1191237952 (LWP 20839)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9ac0f0) at
 rts/Schedule.c:672
   14 Thread 1182845248 (LWP 20838)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9aa5b0) at
 rts/Schedule.c:672
   13 Thread 1174452544 (LWP 20837)  gcWorkerThread (cap=<value optimized
 out>) at includes/SpinLock.h:45
   12 Thread 1166059840 (LWP 20836)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a6f30) at
 rts/Schedule.c:672
   11 Thread 1157667136 (LWP 20835)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a53f0) at
 rts/Schedule.c:672
   10 Thread 1149274432 (LWP 20834)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a38b0) at
 rts/Schedule.c:672
   9 Thread 1140881728 (LWP 20833)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a1d70) at
 rts/Schedule.c:672
   8 Thread 1132489024 (LWP 20832)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9a0230) at
 rts/Schedule.c:672
   7 Thread 1124096320 (LWP 20831)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x99e6f0) at
 rts/Schedule.c:672
   6 Thread 1115703616 (LWP 20830)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x99cbb0) at
 rts/Schedule.c:672
   5 Thread 1107310912 (LWP 20829)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x99b070) at
 rts/Schedule.c:672
   4 Thread 1098918208 (LWP 20827)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x999530) at
 rts/Schedule.c:672
   3 Thread 1090525504 (LWP 20826)  0x000000000067262c in schedule
 (initialCapability=<value optimized out>, task=0x9979f0) at
 rts/Schedule.c:672
   2 Thread 1082132800 (LWP 20825)  0x00002aae0cee86e2 in select () from
 /lib64/libc.so.6
   1 Thread 46927031233680 (LWP 20824)  0x00002aae0cd1c2ef in
 pthread_mutex_lock () from /lib64/libpthread.so.0

 valgrind memcheck does not report any errors prior to the NULL deref:
 ==20983== Thread 2:
 ==20983== Invalid read of size 8
 ==20983==    at 0x67262C: schedule (Schedule.c:672)
 ==20983==    by 0x673204: workerStart (Schedule.c:2033)
 ==20983==    by 0x58E4142: start_thread (in /lib64/libpthread-2.4.so)
 ==20983==    by 0x5AB78CC: clone (in /lib64/libc-2.4.so)
 ==20983==  Address 0x1d0 is not stack'd, malloc'd or (recently) free'd
 ==20983==
 ==20983== Process terminating with default action of signal 11 (SIGSEGV)

 valgrind helgrind reports numerous possible data race conditions,
 including one between schedule and GarbageCollect just prior to the crash:
 ==21106== Possible data race during read of size 8 at 0x5ccc300 by thread
 #5
 ==21106==    at 0x672612: schedule (Schedule.c:367)
 ==21106==    by 0x673204: workerStart (Schedule.c:2033)
 ==21106==    by 0x4A24A1E: mythread_wrapper (hg_intercepts.c:194)
 ==21106==    by 0x58E7142: start_thread (in /lib64/libpthread-2.4.so)
 ==21106==    by 0x5ABA8CC: clone (in /lib64/libc-2.4.so)
 ==21106==  This conflicts with a previous write of size 8 by thread #1
 ==21106==    at 0x679B8D: GarbageCollect (SpinLock.h:55)
 ==21106==    by 0x6712D0: scheduleDoGC (Schedule.c:1522)
 ==21106==    by 0x672C77: schedule (Schedule.c:621)
 ==21106==    by 0x66FF84: real_main (RtsMain.c:68)
 ==21106==    by 0x67009D: hs_main (RtsMain.c:117)
 ==21106==    by 0x5A17183: (below main) (in /lib64/libc-2.4.so)

 I am still working on whether I can make a small program to reproduce
 this.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3295>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to