Reply inline…

On Nov 16, 2016, at 4:29 PM, Bernhard Urban via android-devel 
<[email protected]> wrote:
> everytime I look at a runtime bug on Android, something doesn't feel right. 
> Reports look different to each other. So I tried to get a better 
> understanding on how we handle a SIGSEGV in the runtime and what the output 
> is supposed to be. There are three basic steps [1]:
> 
> (1) we print a managed stacktrace.
> (2) we print a native stacktrace: we do that either via libunwind or 
> libcorkscrew depending on what is available. if neither is available, we do 
> nothing.
> (3) we call `exit (-1)`, which might give us more information such as a 
> register dump.

Unfortunately, there are (implicitly!) *more* than three basic steps, and I’m 
fairly sure I still don’t understand what all is going on. For more wonderful 
context:

        
https://github.com/mono/mono/commit/5d07b77a67f61576318a30e8b1c5f65f7f26b1cf
> when a process crashes on Android, ideally:
> 
> 1. The Android signal handler is executed,
> 2. Bionic will attempt to connect to /system/bin/debuggerd.
> 3. debuggerd will try to connect to the crashing process, then
>  retrieve "useful" information from the crashing process (stack
>  trace, register values, etc.)


The “fun” is in trying to intermix Mono’s SIGSEGV handling mechanism in with 
Android’s infrastructure, which involves having an extra process (`debuggerd`) 
connect to the process to dump process state.

Additionally, I *believe* — but have not retested or reverified — that the 
`exit(-1)` within `mini-exceptions.c` won’t be executed, because of the 
Xamarin.Android calls `mono_set_crash_chaining(1)`, which sets 
`mono_do_crash_chaining` to 1:

        
https://github.com/xamarin/xamarin-android/blob/f862032/src/monodroid/jni/monodroid-glue.c#L2802

Not that any of the above in any way helps further improve reliability…

> That's the idea, unfortunately that is not always what we get.  In order to 
> see the behaviour across different devices and versions of Android, I made 
> this simple crashing app: [2]. As soon as you click the button the 
> application segfaults. For that I wrote a UI test and sent it off to Xamarin 
> Test Cloud and collected the logs: [3]. Note that every device ran the same 
> APK.
> 
> Out of 19 devices, there are really only two devices where the crash report 
> looks like it should: samsung_google_nexus_10-4.4.txt and 
> xiaomi_mi_4-4.4.4.txt.  On many devices we only get a managed stacktrace and 
> then the fun is over.
> 
> Why?
> 
> Good question. Luckily I have a device on my desk where this is the case, so 
> I did a bit of printf debugging. What I figured out is, that the call to 
> `mono_exception_native_unwind ()` in [4] is where the fun stops. The message 
> I see on adb logcat:
> 
> 11-15 20:51:44.790  7093  7093 E audit   : type=1701 
> msg=audit(1479239504.790:1839): auid=4294967295 uid=10288 gid=10288 
> ses=4294967295 subj=u:r:untrusted_app:s0:c512,c768 pid=14937 
> comm="artup.lulzcrash" exe="/system/bin/app_process32" sig=11

Are there any other `adb logcat` messages? The above looks like an 
SELinux-related message. (I have no idea what it *means*, but that’s what it 
looks like…)

> I see the text of a printf right before that call. printf at the beginning of 
> the function doesn't happen. If I move `mono_exception_native_unwind ()` 
> right before the managed stack unwinding, it crashes there, so it isn't a 
> timeout mechanism. I have no idea why on earth this is the case. 
> Unfortunately there is no clue from which PC the signal is coming from (maybe 
> we cause another fault in the handler? maybe android interferes somehow?)

`debuggerd`?

> Anyone has some idea?  Please tell me I overlook something obvious here.  (I 
> haven't had success yet with gdb/lldb)

I’ve only had success with gdb when using 32-bit targets. 64-bit targets give 
me gdb protocol mismatch errors. :-(

> Regardless, I want to suggest some things:
> 
> (a) we should get rid of the dynamic loading of libunwind/libcorkscrew. Some 
> devices don't ship it. Instead, we should include it in the runtime. I think 
> it's worth the extra footprint (if that is the concern why it wasn't done in 
> the first place).

This is *absolutely* something we should consider. This is even more important 
in the context of Android 7.0 Nougat, which won’t allow us to load those native 
libraries, even if they do exist.

- Jon

_______________________________________________
Mono-devel-list mailing list
[email protected]
http://lists.dot.net/mailman/listinfo/mono-devel-list

Reply via email to