Reply inline…
On Nov 16, 2016, at 4:29 PM, Bernhard Urban via android-devel
<[email protected]> wrote:
> everytime I look at a runtime bug on Android, something doesn't feel right.
> Reports look different to each other. So I tried to get a better
> understanding on how we handle a SIGSEGV in the runtime and what the output
> is supposed to be. There are three basic steps [1]:
>
> (1) we print a managed stacktrace.
> (2) we print a native stacktrace: we do that either via libunwind or
> libcorkscrew depending on what is available. if neither is available, we do
> nothing.
> (3) we call `exit (-1)`, which might give us more information such as a
> register dump.
Unfortunately, there are (implicitly!) *more* than three basic steps, and I’m
fairly sure I still don’t understand what all is going on. For more wonderful
context:
https://github.com/mono/mono/commit/5d07b77a67f61576318a30e8b1c5f65f7f26b1cf
> when a process crashes on Android, ideally:
>
> 1. The Android signal handler is executed,
> 2. Bionic will attempt to connect to /system/bin/debuggerd.
> 3. debuggerd will try to connect to the crashing process, then
> retrieve "useful" information from the crashing process (stack
> trace, register values, etc.)
The “fun” is in trying to intermix Mono’s SIGSEGV handling mechanism in with
Android’s infrastructure, which involves having an extra process (`debuggerd`)
connect to the process to dump process state.
Additionally, I *believe* — but have not retested or reverified — that the
`exit(-1)` within `mini-exceptions.c` won’t be executed, because of the
Xamarin.Android calls `mono_set_crash_chaining(1)`, which sets
`mono_do_crash_chaining` to 1:
https://github.com/xamarin/xamarin-android/blob/f862032/src/monodroid/jni/monodroid-glue.c#L2802
Not that any of the above in any way helps further improve reliability…
> That's the idea, unfortunately that is not always what we get. In order to
> see the behaviour across different devices and versions of Android, I made
> this simple crashing app: [2]. As soon as you click the button the
> application segfaults. For that I wrote a UI test and sent it off to Xamarin
> Test Cloud and collected the logs: [3]. Note that every device ran the same
> APK.
>
> Out of 19 devices, there are really only two devices where the crash report
> looks like it should: samsung_google_nexus_10-4.4.txt and
> xiaomi_mi_4-4.4.4.txt. On many devices we only get a managed stacktrace and
> then the fun is over.
>
> Why?
>
> Good question. Luckily I have a device on my desk where this is the case, so
> I did a bit of printf debugging. What I figured out is, that the call to
> `mono_exception_native_unwind ()` in [4] is where the fun stops. The message
> I see on adb logcat:
>
> 11-15 20:51:44.790 7093 7093 E audit : type=1701
> msg=audit(1479239504.790:1839): auid=4294967295 uid=10288 gid=10288
> ses=4294967295 subj=u:r:untrusted_app:s0:c512,c768 pid=14937
> comm="artup.lulzcrash" exe="/system/bin/app_process32" sig=11
Are there any other `adb logcat` messages? The above looks like an
SELinux-related message. (I have no idea what it *means*, but that’s what it
looks like…)
> I see the text of a printf right before that call. printf at the beginning of
> the function doesn't happen. If I move `mono_exception_native_unwind ()`
> right before the managed stack unwinding, it crashes there, so it isn't a
> timeout mechanism. I have no idea why on earth this is the case.
> Unfortunately there is no clue from which PC the signal is coming from (maybe
> we cause another fault in the handler? maybe android interferes somehow?)
`debuggerd`?
> Anyone has some idea? Please tell me I overlook something obvious here. (I
> haven't had success yet with gdb/lldb)
I’ve only had success with gdb when using 32-bit targets. 64-bit targets give
me gdb protocol mismatch errors. :-(
> Regardless, I want to suggest some things:
>
> (a) we should get rid of the dynamic loading of libunwind/libcorkscrew. Some
> devices don't ship it. Instead, we should include it in the runtime. I think
> it's worth the extra footprint (if that is the concern why it wasn't done in
> the first place).
This is *absolutely* something we should consider. This is even more important
in the context of Android 7.0 Nougat, which won’t allow us to load those native
libraries, even if they do exist.
- Jon
_______________________________________________
Mono-devel-list mailing list
[email protected]
http://lists.dot.net/mailman/listinfo/mono-devel-list