Hi team,

everytime I look at a runtime bug on Android, something doesn't feel right. 
Reports look different to each other. So I tried to get a better understanding 
on how we handle a SIGSEGV in the runtime and what the output is supposed to 
be. There are three basic steps [1]:

(1) we print a managed stacktrace.
(2) we print a native stacktrace: we do that either via libunwind or 
libcorkscrew depending on what is available. if neither is available, we do 
nothing.
(3) we call `exit (-1)`, which might give us more information such as a 
register dump.


That's the idea, unfortunately that is not always what we get.  In order to see 
the behaviour across different devices and versions of Android, I made this 
simple crashing app: [2]. As soon as you click the button the application 
segfaults. For that I wrote a UI test and sent it off to Xamarin Test Cloud and 
collected the logs: [3]. Note that every device ran the same APK.

Out of 19 devices, there are really only two devices where the crash report 
looks like it should: samsung_google_nexus_10-4.4.txt and 
xiaomi_mi_4-4.4.4.txt.  On many devices we only get a managed stacktrace and 
then the fun is over.

Why?

Good question. Luckily I have a device on my desk where this is the case, so I 
did a bit of printf debugging. What I figured out is, that the call to 
`mono_exception_native_unwind ()` in [4] is where the fun stops. The message I 
see on adb logcat:

11-15 20:51:44.790  7093  7093 E audit   : type=1701 
msg=audit(1479239504.790:1839): auid=4294967295 uid=10288 gid=10288 
ses=4294967295 subj=u:r:untrusted_app:s0:c512,c768 pid=14937 
comm="artup.lulzcrash" exe="/system/bin/app_process32" sig=11

I see the text of a printf right before that call. printf at the beginning of 
the function doesn't happen. If I move `mono_exception_native_unwind ()` right 
before the managed stack unwinding, it crashes there, so it isn't a timeout 
mechanism. I have no idea why on earth this is the case. Unfortunately there is 
no clue from which PC the signal is coming from (maybe we cause another fault 
in the handler? maybe android interferes somehow?)

Anyone has some idea?  Please tell me I overlook something obvious here.  (I 
haven't had success yet with gdb/lldb)




Regardless, I want to suggest some things:

(a) we should get rid of the dynamic loading of libunwind/libcorkscrew. Some 
devices don't ship it. Instead, we should include it in the runtime. I think 
it's worth the extra footprint (if that is the concern why it wasn't done in 
the first place).

(b) we should handle dumping the registers and memory areas ourselves, so it's 
in our control and doesn't depend on the mood of some particular Android 
implementation.  Especially hexdumping around PC could be useful (combined with 
tools like [5]).

(c) print /proc/$PID/maps, so we get a useful mapping of the currently loaded 
modules.  this gives us some backup in case libunwind fails (at least I failed 
to get the base address of loaded modules like libmonosgen.so from the logs?).



Comments? :-)


Thanks,
-Bernhard


[1] 
https://github.com/mono/mono/blob/94b8270e9bdbd9280de1ec144af20877d8c8d055/mono/mini/mini-exceptions.c#L2348
[2] https://gist.github.com/lewurm/8203e7087c72388a820f67502eca19fd
[3] https://gist.github.com/lewurm/4130c23742bc5694898d2c39ced29e52
[4] 
https://github.com/mono/mono/blob/94b8270e9bdbd9280de1ec144af20877d8c8d055/mono/mini/mini-exceptions.c#L2434
[5] https://onlinedisassembler.com/odaweb/
_______________________________________________
Mono-devel-list mailing list
[email protected]
http://lists.dot.net/mailman/listinfo/mono-devel-list

Reply via email to