On Sat, Oct 17, 2015 at 04:58:06PM +0200, Peter Bex wrote: > Well, here's a bunch of patches to fix this issue (#1221) as well as a > handful of memory-related issues. I still haven't managed to pinpoint > the crashes we're seeing on Salmonella, but this is a separate issue > which seems to be fixed by one of the attached patches.
OK, I've been doing some more digging, and I think I managed to figure out why the crashes are happening. It's extremely obscure and happens so rarely because it relies a *lot* on the particular call pattern, and even on timing because it gets triggered in handle_interrupt, which is time-driven. The strange "state vector" which we sometimes(?) use for capturing a thread's continuation will be abused as an argvector by passing a pointer straight into the middle of it. This means that due to the argvector re-use, this state vector may inadvertently mutated during the execution of the program, but only if the "right" (wrong?) functions are called in sequence. Please, apply the other patches (which this mail is a reply to) first, as I think they're pretty important as well. Then apply the attached patch. It applies to both master and chicken-5. Cheers, Peter
From f2b33d64a2fd98b66e3eecaf894de04130f74ccf Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Sat, 24 Oct 2015 21:17:07 +0200 Subject: [PATCH 1/2] Copy thread "state vector" to a fresh argvector before invoking the trampoline function. Upon resumption of a thread from a "state vector" (rather than a closure/continuation), the trampoline function was passed a pointer right into the state vector, which doubled as an argument vector. Now that argvectors may be re-used, their slots can be trashed any time in the call history, depending on the exact situation. These argvector slots will eventually contain temporary data which becomes invalid over time. When the original state vector is then copied during garbage collection, the mutated slot is also copied. When the Cheney algorithm traverses over this slot, it will break down because the data it points to is now invalid. This hopefully fixes the "random crashes" we've been seeing on Salmonella the last few weeks. --- runtime.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/runtime.c b/runtime.c index 43edfe9..2d477c0 100644 --- a/runtime.c +++ b/runtime.c @@ -7979,10 +7979,17 @@ void C_ccall C_context_switch(C_word c, C_word *av) /* closure = av[ 0 ] */ state = av[ 2 ], n = C_header_size(state) - 1, - adrs = C_block_item(state, 0); + adrs = C_block_item(state, 0), + *av2; C_proc tp = (C_proc)C_block_item(adrs,0); - tp(n, (C_word *)state + 2); + /* Copy argvector because it may be mutated in-place. The state + * vector should not be re-invoked(?), but it can be kept alive + * during GC, so the mutated argvector/state slots may turn stale. + */ + av2 = C_alloc(n); + C_memcpy(av2, (C_word *)state + 2, n * sizeof(C_word)); + tp(n, av2); } -- 2.1.4
signature.asc
Description: Digital signature
_______________________________________________ Chicken-hackers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-hackers
