#4922: Segfault / Assertion failed in RTS (Compact.c)
-------------------------------+--------------------------------------------
Reporter: dleuschner | Owner:
Type: bug | Status: new
Priority: normal | Component: Runtime System
Version: 7.0.1 | Keywords:
Testcase: | Blockedby:
Os: Linux | Blocking:
Architecture: x86_64 (amd64) | Failure: Runtime crash
-------------------------------+--------------------------------------------
Our application terminates with a segfault or an internal RTS error in
about 80% of our testruns when we use the following runtime flags:
{{{
+RTS -G4 -H1g -c -I0
}}}
Without them the application runs fine. We discovered the problem only
after having done many performance improvements to our code while doing
stress tests with fast CPUs with many cores.
We compiled with the debugging runtime and got the following assertion
failure:
{{{
SalviaDerivationGateway: internal error: ASSERTION FAILED: file
rts/sm/Compact.c, line 171
(GHC version 7.0.1.20110121 for x86_64_unknown_linux)
Please report this as a GHC bug:
http://www.haskell.org/ghc/reportabug
}}}
We're testing with a custom GHC build from the GHC 7.0 branch (with
patches until yesterday).
Without the debugging runtime we sometimes get segfaults and sometimes
errors like:
{{{
SalviaDerivationGateway: internal error: scavenge_mark_stack:
unimplemented/strange closure type 1970861226 @ 0x7f7578f488f8
(GHC version 7.0.1.20110121 for x86_64_unknown_linux)
Please report this as a GHC bug:
http://www.haskell.org/ghc/reportabug
}}}
The last few system calls before a segfault are:
{{{
[pid 30727] rt_sigprocmask(SIG_BLOCK, [HUP INT], [], 8) = 0
[pid 30727] clock_gettime(0xfffffffa /* CLOCK_??? */, {147, 512463346}) =
0
[pid 30727] getrusage(RUSAGE_SELF, {ru_utime={126, 620000}, ru_stime={20,
890000}, ...}) = 0
[pid 30727] mmap(0x7fb643800000, 3145728, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb643400000
[pid 30727] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
}}}
We were very concerned about the situation because an unstable runtime
system really feels like we should better be using Java for "serious"
applications. It's absolutely no problem now because we'll just not use
the tuned runtime system flags. It might be a good idea to remove them
entirely until they're known to work in busy applications. (Or at least
include a warning.)
I don't understand any of the details but maybe the problem with retainer
profiling (issue #4820) has the same cause.
When testing new releases it would probably be a good idea to also test
various flag combinations (maybe the GHC compiler binary could just choose
some random values during startup if none are given ;-).
I hope this information is of some help. We haven't tried to reproduce
the problem with a small test program as we're a bit in a hurry doing a
release. If there is anything we can do to help to find the cause of the
problem, please let us know.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4922>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs