Hello I've found out that all of my comments in bugs HARMONY-5019 (original bug report is HARMONY-3269) and HARMONY-3581 may be confusing and none of the is complete. The cause of bug is quite complex, so I decided to write this post for future (I hope there won't be any any more) references and as a complete explanation of the bug. I also hope that maybe someone who understands GPLed code discussed later may write a reply to this email.
The crash symptom is the stack with _Unwind_ForcedUnwind function in it, like shown in description for HARMONY-3581. The stack usually ends up with some weird address, often it is 0xdeadbeefdeadbeef. Instruction that crashes the one that tries to access this address (usually moves a value on this address to some register (I always saw RDX)), but it is not mapped, and therefore not accessible. There are two causes that lead to calling _Unwind_ForcedUnwind. It is either throwing a C++ exception or pthread_cancel that cancels the thread. For C++ exception gcc calls libgcc_s function _Unwind_ForcedUnwind. For pthread_cancel, a signal handler that handles SIGCANCEL from pthread library tries to throw an uncatchable exception and unwinds the stack using _Unwind_ForcedUnwind in the way identical to C++ exception unwinding. Why it throws uncatchable exception I don't know, I didn't read glibc code to understand pthreads logic, it is under GPL. Probably it tries to determine the location where SIGCANCEL was received by the thread. In any case, stack unwinding is started for some thread. On x86_64 stack unwinding is a tricky business because there are no stack frames as on x86. So libgcc_s code relies mostly on DWARF2 information. For some reason unknown to me even if there is a C++ exception handler on the stack, all of the stack is scanned by unwinding code. Unwinding code pretty well jumps from callee to caller on all of the code that I've seen, but it doesn't like it when caller is no longer a mapped code because it doesn't only analyze thread stack, it also tries to access the code instructions pointed to by return address. There is some heuristics for x86_64 architecture that requires to check the code, not only return address in the stack. So, if there is any unmapped code on the thread stack, the crash is imminent. Crash handler doesn't usually help because it doesn't show any code down the stack if it encounters memory with no read permission. So usually the cause is not evident. Why unmapped code happened to be in threads stack when execution applications on DRLVM is a separate question. In two cases there were bugs. First cause was because JVMTI agent was unloaded, and then its thread was canceled with pthread_cancel (HARMONY-5019). Second case was when interpreter library was unloaded too early, and a thread also was canceled with pthread_cancel (HARMONY-3581). In both cases interrupted threads were executing pthread_cond_timedwait and other functions down the stack were valid code. But the libraries' code that called this wait were somewhere on the stack, and therefore canceling such threads caused a crash. These two bugs are fixed now, but something similar may happen in the future, and therefore I wrote this text. In no conditions there should be unmapped code in any thread stack, even if it works on Linux x86 or Windows of any architecture. -- Gregory
