[dev-platform] Heads Up: New Crash Analyzer (minidump-stackwalk) for "Try" and Local Builds

Alexis Beingessner Tue, 23 Nov 2021 11:45:15 -0800

 *High Level Details*

Bug 1741205 <https://bugzilla.mozilla.org/show_bug.cgi?id=1741205> has just
landed.


minidump-stackwalk
<https://searchfox.org/mozilla-central/search?q=minidump-stackwalk&path=>,
the tool that is used to produce a backtrace from a minidump when a test
unexpectedly crashes (via mozcrash.py) has been completely rewritten in
Rust. *The core behaviour of the tool should be largely unchanged.* It
should produce basically the exact same backtraces, although I expect it
will be faster and more reliable. If you notice any problems, please let us
know by filing a bug against "Toolkit > Crash Reporting" or by leaving a
comment on Bug 174120 <https://bugzilla.mozilla.org/show_bug.cgi?id=1741205>.


Although this is *ideally *a non-functional change for almost everyone,
this is a major step in the ongoing oxidation of our crash reporting
infrastructure: this is the first deployment of rust-minidump
<https://github.com/luser/rust-minidump/> as a replacement for breakpad's
minidump processor. This same implementation will (if testing goes
smoothly) become the new backend for crash-stats.mozilla.org.

Moving to rust-minidump will make it easier to maintain our crash reporting
infrastructure and make improvements to it. We've already found several
long-standing issues that it accidentally fixes *and* found that it does
runs faster than the old version. And of course, moving from C++ to Rust
trivially improves the security of our infrastructure. Hooray!

Perhaps more immediately interesting to some of you:* rust-minidump also
makes it a lot easier to be a crash-analysis power user, and is easier to
build and run locally* (see the last section).

Sections:

   - *What is affected* (and how to tell which version is being used)
   - *Nitty Gritty details* (how builds/tooling configs have changed)
   - *New crash analysis power user features* (digging deeper into
   minidumps)



*What is affected*

This tool is used to generate this kind of line in the try frontend:

PROCESS-CRASH | Last test finished | application crashed [@ static
mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>,
bool)]

And this information in the logs:

INFO - mozcrash Copy/paste:
Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe
--human
C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
Z:\task_163718277621850\build\symbols --symbols-url=
https://symbols.mozilla.org/
INFO - mozcrash Saved minidump as
Z:\task_163718277621850\build\blobber_upload_dir\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
INFO - PROCESS-CRASH | Last test finished | application crashed [@ static
mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>,
bool)]
INFO - Crash dump filename:
C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
INFO - Operating system: Windows NT
INFO -                   10.0.19041
INFO - CPU: amd64
INFO -      family 6 model 85 stepping 7
INFO -      8 CPUs
INFO -
INFO - Crash reason:  EXCEPTION_BREAKPOINT
INFO - Crash address: 0x7ff8ae0df019
INFO - Process uptime: 2 seconds
INFO -
INFO - Thread 3 TaskController #2 (crashed)
INFO -  0  xul.dll!static
mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>,
bool) [SurfaceCache.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 1831 +
0x40]
INFO -     rax = 0x00007ff8b541d9f9   rdx = 0x0000000000000000
INFO -     rcx = 0x00007ff8e277c978   rbx = 0x0000000000000001
INFO -     rsi = 0x00000041a977f350   rdi = 0x00000177b0f6f740
INFO -     rbp = 0x00000177aa172130   rsp = 0x00000041a977f2b0
INFO -      r8 = 0x00000041a977f820    r9 = 0x00007ff8ea530000
INFO -     r10 = 0x00007ff8ea582651   r11 = 0x00000041a977ec70
INFO -     r12 = 0x00007ff8e26e9630   r13 = 0x00000177b3a15120
INFO -     r14 = 0x00000177b0f970d8   r15 = 0x00000177b28a2a00
INFO -     rip = 0x00007ff8ae0df019
INFO -     Found by: given as instruction pointer in context
INFO -  1  xul.dll!mozilla::image::DecodedSurfaceProvider::FinishDecoding()
[DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 200
+ 0x37]
INFO -     rbx = 0x0000000000000001   rbp = 0x00000177aa172130
INFO -     rsp = 0x00000041a977f310   r12 = 0x00007ff8e26e9630
INFO -     r13 = 0x00000177b3a15120   r14 = 0x00000177b0f970d8
INFO -     r15 = 0x00000177b28a2a00   rip = 0x00007ff8ae0b0dbf
INFO -     Found by: call frame info
INFO -  2  xul.dll!mozilla::image::DecodedSurfaceProvider::Run()
[DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 129
+ 0x7]
INFO -     rbx = 0x0000000000000001   rbp = 0x00000177aa172130
INFO -     rsp = 0x00000041a977f390   r12 = 0x00007ff8e26e9630
INFO -     r13 = 0x00000177b3a15120   r14 = 0x00000177b0f970d8
INFO -     r15 = 0x00000177b28a2a00   rip = 0x00007ff8ae0b0929
INFO -     Found by: call frame info


Only the new implementation will include the `--human` flag in the "Copy/paste:
Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe"
line, if you aren't sure which implementation is being used. However I
believe this line is omitted if fix-stacks.py is involved. In that case,
you can use the fact that the backtraces will end with "unimplemented
streams", a self-debugging feature unique to the new tool:

INFO -  Unimplemented streams encountered:
INFO -  Stream 0x00000000 UnusedStream (Official) @ 0x00000000
INFO -  Stream 0x00000015 SystemMemoryInfoStream (Official) @ 0x00002d98
INFO -  Stream 0x00000016 ProcessVmCountersStream (Official) @ 0x00002f84

These differences similarly apply to local crashes, as far as I know
(unverified).



*Nitty Gritty Details *

On paper, this should be a significant upgrade to try's minidump-stackwalk,
in that the one that is currently being used is a weird unmaintained fork
of what's used on crash-stats. The new one should be faster and more
reliable, and will be easy to update by just changing the commit in
its toolchain
fetch
<https://searchfox.org/mozilla-central/source/taskcluster/ci/fetch/toolchains.yml>
:

rust-minidump:
    description: rust-minidump source code (for minidump-stackwalk)
    fetch:
        type: git
        repo: https://github.com/luser/rust-minidump/
        revision: 0c90e02544797317503d1c4cff8daab0cabdea86

However this will introduce some changes to how minidump-stackwalk is built:


   - minidump-stackwalk no longer builds from checked-in source, so it will
   only need to be built if the toolchain fetch is updated (less builds and
   churn, yay!)
   - This introduces a currently orphaned win64-minidump-stackwalk build,
   for future use in solving Bug 1410840. That can be removed if having an
   orphan tool sets off some annoying warnings/errors for the sheriffs.
   - There was technically some wiring in `mach` to support building
   minidump-stackwalk locally. this no longer works, as there is no local
   source to use. mozboot was already downloading `minidump_stackwalk` for
   you, so it's unlikely this will affect anyone's workflow.
   - If for whatever reason you want to build your own copy of
   minidump-stackwalk, you can:
      -  `cargo install minidump-stackwalk`
      - or checkout the rust-minidump tree
      <https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk>
      and build it with `cargo build --release` (that's all our build
      servers do!)
      - NB: the rust binary is officially "minidump-stackwalk" but we
      rename it to "minidump_stackwalk" in CI to avoid needless churn,
if you do
      either of these things, you *will* get a binary called
      "minidump-stackwalk"



*Crash Analysis Power User Features*

The new minidump-stackwalk
<https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk>
should build and run locally on all mainstream platforms without any issue
via "cargo install minidump-stackwalk". I have done my best to write user
documentation
<https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk#analyzing-firefox-minidumps>
and the process for analyzing a firefox minidump is streamlined:

> minidump-stackwalk --symbols-url=https://symbols.mozilla.org/
/path/to/minidump.dmp

Because it is designed to be the backend for crash-stats (aka socorro), it
can do ~all the analysis you expect there (and more details are surfaced in
the default JSON output than the --human one used by "try" and local
builds).

To help me test this, I have also created a tool for downloading
minidumps/annotations from crash-stats and comparing local
minidump-stackwalk output to the values on crash-stats: socc-pair
<https://github.com/Gankra/socc-pair/>. While the comparison machinery is
probably not necessarily useful to you, as a side-effect of its purpose it
automates fetching all the details of a crash and running local analysis,
and saves the results to files you can search through for details:

      socc-pair --api-token=f0c129d4467bf58eeca0ad8e8e5d
--crash-id=b4f58e9f-49be-4ba5-a203-8ef160211027

<lots of interesting analysis>
...

Output Files:
  * Minidump:
C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.dmp
  * Socorro Processed Crash:
C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.json
  * Raw JSON:
C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.raw.json
  * Local minidump-stackwalk Output:
C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.local.json
  * Local minidump-stackwalk Logs:
C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.log.txt

*NOTE: these files can contain protected user information. Although they
are written to your system's default "temp", I recommend deleting the temp
`socc-pair` directory regularly to ensure compliance with our protected
data policies
<https://crash-stats.mozilla.org/documentation/protected_data_access/>.*

Notably this includes all of the logging rust-minidump did (in the
`.log.txt`), including tracing for the backtracer's analysis
<https://github.com/Gankra/socc-pair/#debugging-backtraces>, which can help
you debug strange backtraces:


[TRACE] unwind: starting stack unwind
[TRACE] unwind: unwinding NtGetContextThread
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind:   .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind:   .cfa: $rsp 8 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip:
0x000000ec00000000, caller_sp: 0x000000ec7fbfd790
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: unwinding 1013612281855
[TRACE] unwind: trying cfi
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd172c2a24, caller_sp:
0xec7fbfd7f8
[TRACE] unwind: unwinding <unknown in ntdll.dll>
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd162b7034, caller_sp:
0xec7fbfd828
[TRACE] unwind: unwinding BaseThreadInitThunk
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind:   .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind:   .cfa: $rsp 48 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip:
0x0000000000000000, caller_sp: 0x000000ec7fbfd858
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: instruction pointer was nullish, assuming unwind complete
[TRACE] unwind: finished stack unwind


If you are using minidump-stackwalk directly, you can get these same logs
with `--verbose=trace`. `--output-file=x/y/z` and `--log-file=x/y/z`
arguments are included to make it easier to pipe these streams to files.

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CAHDr%2B1%3DmU519iChaj2dKNkdQdvZ2vpDFuWmf_W2pqD8%2BHh1Ymw%40mail.gmail.com.

[dev-platform] Heads Up: New Crash Analyzer (minidump-stackwalk) for "Try" and Local Builds

Reply via email to