I want to clarify a few things.

1. "Socorro is not a strong platform for aggregate analysis of
crashes".

The issue here isn't Socorro, but rather what happens when Firefox
crashes and whether that triggers a crash report submit dialog the
user can click a button on. Because there are many cases where either
the user doesn't submit the report or the user never sees a dialog,
Socorro's data is heavily biased and doesn't have a representative set
of crash data and thus shouldn't be used for aggregate analysis for
concluding health or other things like that.

Do all products have the same problems that Firefox has such that
Socorro doesn't have a representative data set? I don't know.

The top crashers page does work and it is interesting, but because of
the Socorro data bias issue and the fact that we've got a whole team
working on Telemetry analysis tools and it's just me on Socorro, we
wanted to shift the model to doing aggregate analysis with Telemetry
tools and data and deep-dive crash investigation in Socorro. That's
where things are headed. I don't think we're there now. For example, I
don't think there's a top crashers equivalent in Telemetry land, yet.

2. Regarding "no long-term plan for investment for first-class support
of managed code", no one's ever asked for that before. If there's a
need for that and it's not doable elsewhere or there are compelling
reasons to do that on Socorro, we can change Socorro. Socorro handles
incoming Java crash reports in breakpad format but it's not very
robust because no one's been interested in making it better. We don't
have the same data with Java crash reports that we have with breakpad
crash reports, so I think this requires changes on the crash reporter
side.

Sentry does have a Sentry client for Java. That's definitely better
than what Crash reporter + Socorro does today. Socorro could get
better if that's the right move.

3. Regarding "William to investigate bare minidumps...", I think
there's a terminology issue here. "minidump" refers to the format that
the breakpad client uses to store information about the crashed
process namely register values, the stack, and some other things. I
was going to investigate whether crash reports with minimal data in
them (i.e. no annotations and no minidumps) work and how they show up
in Socorro.

Turns out they show up fine:

https://crash-stats.allizom.org/report/index/620e0ca5-a49e-401e-95eb-240ed0180810

I only sent one, so that won't show up in top crashers, but it does
have the bits it needs to do so. That crash report was this:

    "ProductName": "Thunderbird"


in breakpad format. No other annotations. No minidumps.

I hope that helps!

If there's anything else I can do, let me know.

/will

On Thu, Aug 9, 2018 at 2:14 PM, Nicholas Alexander
<nalexan...@mozilla.com> wrote:
> Hi folks,
>
>
> On Thursday, August 8, James Willcox, Sebastian Kaspari, and I (Nick
> Alexander) met with William Kahn-Greene and Chris Lonnen to discuss using
> Socorro/crash-stats to analyse crashes in non-Firefox contexts.  You can
> read the mostly complete detailed notes.
>
>
> The worldview for determining stability
>
> Mozilla has been pushing towards a two pronged aggregate analysis approach:
>
> We use “crash pings”, which are a high-volume, low-specificity Telemetry
> signal to understand Firefox stability.
>
> We use “crash reports”, which are a low-volume, high-specificity,
> highly-biased Socorro signal to understand specific stability issues.
>
>
> The critical facts about Socorro, as I understand them:
>
> Socorro is excellent at processing minidumps
>
> Socorro has a robust security model for controlling access to PII
>
> Socorro is not a strong platform for aggregate analysis of crashes
>
>
> From this baseline, the discussion split into two contexts:
>
>
> Using Socorro to understand the health of the GeckoView-consuming Android
> App ecosystem (flagship browser for Android, Firefox Reality, potential
> third-party Apps)
>
> Technically possible right now: need ops to whitelist “application name”,
> but can aggregate minidumps across consuming Apps.
>
> Strong support for non-Gecko native code crashes (e.g., Firefox Reality);
> symbol server largely stands alone and scales to new uses.
>
> Minimal support right now, and no long-term plan for investment into future
> support, for first-class support of “managed code” reports (e.g., Java
> stacktraces or JavaScript stacktraces).
>
> Using Socorro to understand the health of non-Gecko/non-GeckoView Android
> Apps (Notes, Lockbox for Android).
>
> Socorro is not a good platform for understanding “managed code” reports.
>
> Considered better to push non-Gecko native code crashes through Sentry than
> to use Socorro: more problems in managed code than in native code long term.
>
>
> The immediate next steps are:
>
> William to investigate pushing “bare” minidumps to Socorro to understand
> what, e.g., Firefox Reality actually needs to upload
>
> James (snorp) to investigate pushing Gecko minidumps first to Sentry and
> then on to Socorro to understand whether, e.g., Focus (GeckoView) can
> leverage Sentry for managed code crashes while still feeding Gecko stability
> problems into the larger Socorro tool.
>
>
> Many thanks to William and Lonnen for their ongoing attention to this
> effort!  Everybody, please correct errors introduced by your interlocutor.
>
>
> Yours,
>
> Nick (for James, Sebastian, William, and Lonnen)
>
>
_______________________________________________
Dev-fxacct mailing list
Dev-fxacct@mozilla.org
https://mail.mozilla.org/listinfo/dev-fxacct

Reply via email to