*High Level Details* Bug 1741205 <https://bugzilla.mozilla.org/show_bug.cgi?id=1741205> has just landed.
minidump-stackwalk <https://searchfox.org/mozilla-central/search?q=minidump-stackwalk&path=>, the tool that is used to produce a backtrace from a minidump when a test unexpectedly crashes (via mozcrash.py) has been completely rewritten in Rust. *The core behaviour of the tool should be largely unchanged.* It should produce basically the exact same backtraces, although I expect it will be faster and more reliable. If you notice any problems, please let us know by filing a bug against "Toolkit > Crash Reporting" or by leaving a comment on Bug 174120 <https://bugzilla.mozilla.org/show_bug.cgi?id=1741205>. Although this is *ideally *a non-functional change for almost everyone, this is a major step in the ongoing oxidation of our crash reporting infrastructure: this is the first deployment of rust-minidump <https://github.com/luser/rust-minidump/> as a replacement for breakpad's minidump processor. This same implementation will (if testing goes smoothly) become the new backend for crash-stats.mozilla.org. Moving to rust-minidump will make it easier to maintain our crash reporting infrastructure and make improvements to it. We've already found several long-standing issues that it accidentally fixes *and* found that it does runs faster than the old version. And of course, moving from C++ to Rust trivially improves the security of our infrastructure. Hooray! Perhaps more immediately interesting to some of you:* rust-minidump also makes it a lot easier to be a crash-analysis power user, and is easier to build and run locally* (see the last section). Sections: - *What is affected* (and how to tell which version is being used) - *Nitty Gritty details* (how builds/tooling configs have changed) - *New crash analysis power user features* (digging deeper into minidumps) *What is affected* This tool is used to generate this kind of line in the try frontend: PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)] And this information in the logs: INFO - mozcrash Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe --human C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp Z:\task_163718277621850\build\symbols --symbols-url= https://symbols.mozilla.org/ INFO - mozcrash Saved minidump as Z:\task_163718277621850\build\blobber_upload_dir\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp INFO - PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)] INFO - Crash dump filename: C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp INFO - Operating system: Windows NT INFO - 10.0.19041 INFO - CPU: amd64 INFO - family 6 model 85 stepping 7 INFO - 8 CPUs INFO - INFO - Crash reason: EXCEPTION_BREAKPOINT INFO - Crash address: 0x7ff8ae0df019 INFO - Process uptime: 2 seconds INFO - INFO - Thread 3 TaskController #2 (crashed) INFO - 0 xul.dll!static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool) [SurfaceCache.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 1831 + 0x40] INFO - rax = 0x00007ff8b541d9f9 rdx = 0x0000000000000000 INFO - rcx = 0x00007ff8e277c978 rbx = 0x0000000000000001 INFO - rsi = 0x00000041a977f350 rdi = 0x00000177b0f6f740 INFO - rbp = 0x00000177aa172130 rsp = 0x00000041a977f2b0 INFO - r8 = 0x00000041a977f820 r9 = 0x00007ff8ea530000 INFO - r10 = 0x00007ff8ea582651 r11 = 0x00000041a977ec70 INFO - r12 = 0x00007ff8e26e9630 r13 = 0x00000177b3a15120 INFO - r14 = 0x00000177b0f970d8 r15 = 0x00000177b28a2a00 INFO - rip = 0x00007ff8ae0df019 INFO - Found by: given as instruction pointer in context INFO - 1 xul.dll!mozilla::image::DecodedSurfaceProvider::FinishDecoding() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 200 + 0x37] INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130 INFO - rsp = 0x00000041a977f310 r12 = 0x00007ff8e26e9630 INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8 INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0dbf INFO - Found by: call frame info INFO - 2 xul.dll!mozilla::image::DecodedSurfaceProvider::Run() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 129 + 0x7] INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130 INFO - rsp = 0x00000041a977f390 r12 = 0x00007ff8e26e9630 INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8 INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0929 INFO - Found by: call frame info Only the new implementation will include the `--human` flag in the "Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe" line, if you aren't sure which implementation is being used. However I believe this line is omitted if fix-stacks.py is involved. In that case, you can use the fact that the backtraces will end with "unimplemented streams", a self-debugging feature unique to the new tool: INFO - Unimplemented streams encountered: INFO - Stream 0x00000000 UnusedStream (Official) @ 0x00000000 INFO - Stream 0x00000015 SystemMemoryInfoStream (Official) @ 0x00002d98 INFO - Stream 0x00000016 ProcessVmCountersStream (Official) @ 0x00002f84 These differences similarly apply to local crashes, as far as I know (unverified). *Nitty Gritty Details * On paper, this should be a significant upgrade to try's minidump-stackwalk, in that the one that is currently being used is a weird unmaintained fork of what's used on crash-stats. The new one should be faster and more reliable, and will be easy to update by just changing the commit in its toolchain fetch <https://searchfox.org/mozilla-central/source/taskcluster/ci/fetch/toolchains.yml> : rust-minidump: description: rust-minidump source code (for minidump-stackwalk) fetch: type: git repo: https://github.com/luser/rust-minidump/ revision: 0c90e02544797317503d1c4cff8daab0cabdea86 However this will introduce some changes to how minidump-stackwalk is built: - minidump-stackwalk no longer builds from checked-in source, so it will only need to be built if the toolchain fetch is updated (less builds and churn, yay!) - This introduces a currently orphaned win64-minidump-stackwalk build, for future use in solving Bug 1410840. That can be removed if having an orphan tool sets off some annoying warnings/errors for the sheriffs. - There was technically some wiring in `mach` to support building minidump-stackwalk locally. this no longer works, as there is no local source to use. mozboot was already downloading `minidump_stackwalk` for you, so it's unlikely this will affect anyone's workflow. - If for whatever reason you want to build your own copy of minidump-stackwalk, you can: - `cargo install minidump-stackwalk` - or checkout the rust-minidump tree <https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk> and build it with `cargo build --release` (that's all our build servers do!) - NB: the rust binary is officially "minidump-stackwalk" but we rename it to "minidump_stackwalk" in CI to avoid needless churn, if you do either of these things, you *will* get a binary called "minidump-stackwalk" *Crash Analysis Power User Features* The new minidump-stackwalk <https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk> should build and run locally on all mainstream platforms without any issue via "cargo install minidump-stackwalk". I have done my best to write user documentation <https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk#analyzing-firefox-minidumps> and the process for analyzing a firefox minidump is streamlined: > minidump-stackwalk --symbols-url=https://symbols.mozilla.org/ /path/to/minidump.dmp Because it is designed to be the backend for crash-stats (aka socorro), it can do ~all the analysis you expect there (and more details are surfaced in the default JSON output than the --human one used by "try" and local builds). To help me test this, I have also created a tool for downloading minidumps/annotations from crash-stats and comparing local minidump-stackwalk output to the values on crash-stats: socc-pair <https://github.com/Gankra/socc-pair/>. While the comparison machinery is probably not necessarily useful to you, as a side-effect of its purpose it automates fetching all the details of a crash and running local analysis, and saves the results to files you can search through for details: socc-pair --api-token=f0c129d4467bf58eeca0ad8e8e5d --crash-id=b4f58e9f-49be-4ba5-a203-8ef160211027 <lots of interesting analysis> ... Output Files: * Minidump: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.dmp * Socorro Processed Crash: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.json * Raw JSON: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.raw.json * Local minidump-stackwalk Output: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.local.json * Local minidump-stackwalk Logs: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.log.txt *NOTE: these files can contain protected user information. Although they are written to your system's default "temp", I recommend deleting the temp `socc-pair` directory regularly to ensure compliance with our protected data policies <https://crash-stats.mozilla.org/documentation/protected_data_access/>.* Notably this includes all of the logging rust-minidump did (in the `.log.txt`), including tracing for the backtracer's analysis <https://github.com/Gankra/socc-pair/#debugging-backtraces>, which can help you debug strange backtraces: [TRACE] unwind: starting stack unwind [TRACE] unwind: unwinding NtGetContextThread [TRACE] unwind: trying cfi [TRACE] unwind: found symbols for address, searching for cfi entries [TRACE] unwind: trying STACK CFI exprs [TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^ [TRACE] unwind: .cfa: $rsp 8 + [TRACE] unwind: STACK CFI parse successful [TRACE] unwind: STACK CFI seems reasonable, evaluating [TRACE] unwind: successfully evaluated .cfa (frame address) [TRACE] unwind: successfully evaluated .ra (return address) [TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x000000ec00000000, caller_sp: 0x000000ec7fbfd790 [TRACE] unwind: cfi result seems valid [TRACE] unwind: unwinding 1013612281855 [TRACE] unwind: trying cfi [TRACE] unwind: trying frame pointer [TRACE] unwind: trying scan [TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd172c2a24, caller_sp: 0xec7fbfd7f8 [TRACE] unwind: unwinding <unknown in ntdll.dll> [TRACE] unwind: trying cfi [TRACE] unwind: found symbols for address, searching for cfi entries [TRACE] unwind: trying frame pointer [TRACE] unwind: trying scan [TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd162b7034, caller_sp: 0xec7fbfd828 [TRACE] unwind: unwinding BaseThreadInitThunk [TRACE] unwind: trying cfi [TRACE] unwind: found symbols for address, searching for cfi entries [TRACE] unwind: trying STACK CFI exprs [TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^ [TRACE] unwind: .cfa: $rsp 48 + [TRACE] unwind: STACK CFI parse successful [TRACE] unwind: STACK CFI seems reasonable, evaluating [TRACE] unwind: successfully evaluated .cfa (frame address) [TRACE] unwind: successfully evaluated .ra (return address) [TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x0000000000000000, caller_sp: 0x000000ec7fbfd858 [TRACE] unwind: cfi result seems valid [TRACE] unwind: instruction pointer was nullish, assuming unwind complete [TRACE] unwind: finished stack unwind If you are using minidump-stackwalk directly, you can get these same logs with `--verbose=trace`. `--output-file=x/y/z` and `--log-file=x/y/z` arguments are included to make it easier to pipe these streams to files. -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CAHDr%2B1%3DmU519iChaj2dKNkdQdvZ2vpDFuWmf_W2pqD8%2BHh1Ymw%40mail.gmail.com.
