Hi Paul,

Thanks for working on this, the previous state was clearly very unfortunate!

Do I understand correctly that, if I move something that takes 10MB per
process to shared memory, in a way that it uses a total of 20MB of shared
memory for all processes, that'd be reported as a MEMORY_TOTAL *increase*
on macOS (even though we improved memory usage)?

If so, it seems like a pretty big caveat... I guess it might be fine,
assuming we have precise memory reporting on other platforms? Still, it
seems like the kind of thing that's likely to make someone scratch their
head for a while until they find the right root cause (and then find
there's not much they can do about it).

Do we / should we have some documentation for this telemetry probe we could
update / create, to prevent knowledge about this quirk getting lost? Maybe
just a note around here
<https://searchfox.org/mozilla-central/rev/6c1a6e3263c2cae45b285b489ef340d944102262/toolkit/components/telemetry/Histograms.json#1237>
or a comment here
<https://searchfox.org/mozilla-central/rev/6c1a6e3263c2cae45b285b489ef340d944102262/xpcom/base/MemoryTelemetry.cpp#404>
or in the header could do... Not sure :)

Thanks!

 -- Emilio

On Sun, Sep 25, 2022 at 8:04 PM Paul Bone <[email protected]> wrote:

>
> G'day.
>
> If you care about changes to memory telemetry then read-on.
>
> TLDR: I changed how MEMORY_TOTAL is measured on MacOS and the values we
> record are going to increase.  This doesn't represent a true increase in
> memory usage however, just how it's measured, and only on MacOS.  Oh, and
> it's inaccurate, but it was inaccurate before this change and we don't
> good option, so we're going with least-bad, I hope.  If you want to
> know how much memory something uses about:memory is good.
>
> We've had some problems measuring memory usage on MacOS recently.  This
> started when https://bugzilla.mozilla.org/show_bug.cgi?id=1546442#c41
> added
> a guard page within blocks of memory managed by jemalloc.  The guard page
> was added between the block's header and its payload.  We noticed that our
> "resident unique" memory usage halved.  That's not right!
>
> The cause was that by unmap()ing a page (or mprotect()ing it) within a
> memory
> region would break the memory region into multiple regions.  The problem
> was
> that now the memory regions are marked as shared and our measurement of
> "resident unique" memory discounted them thinking them to be shared.
> So by using different APIs within MacOS we can check if they're really
> shared memory (between processes) or private memory that has been aliased
> into
> more than one mapping.
> https://bugzilla.mozilla.org/show_bug.cgi?id=1743781
> BTW, This is (almost) the most accurate way to measure memory.
>
> The problem however is that the new API is slow.  And not only is it slow
> but it seems to jank any other thread/process that may share memory
> mappings.  https://bugzilla.mozilla.org/show_bug.cgi?id=1779138
> That's okay for something like about:memory's memory report, where we still
> use it to calculate the resident-unique measurement.  But it's not okay for
> telemetry, which may run periodically and affect a user's experience.  We
> disabled MEMORY_TOTAL telemetry on MacOS temporarily.
>
> I've now clicked LAND on
> https://bugzilla.mozilla.org/show_bug.cgi?id=1786860 which re-enables the
> MEMORY_TOTAL telemetry, but using a different measurement.  It uses the
> "physical footprint" figure as calculated by MacOS.  This is a nice
> measurement when considering a single process, or when you ask MacOS to
> calculate it for a set of processes on the command line (too slow for us to
> use).  Exactly how it's calculated seems to be an "implementation detail",
> but you can read XNU sources if you like.  But the intention is that it
> represents memory that is "dirtied", in other words, that there would be a
> cost to swapping out if the kernel decided to do so.  It also includes
> shared memory, and that's the problem for Firefox telemetry, we query the
> physical footprint for each process and then add them together, meaning we
> over-count shared memory.  This is why MEMORY_TOTAL will now be larger on
> MacOS and won't be accurate (over-counting).  However it wasn't accurate
> before (completely ignoring shared memory AND counting a lot of private
> aliased memory as shared memory).
>
> All we can really say about MEMORY_TOTAL, before and after these changes is
> that if it's stable-over-time or trending downwards that's good. And if
> it's
> trending upwards that's possibly-not-good (but maybe we're using the memory
> to ship new useful features).
>
> What could we do going forward?
>
>  * We could account for the shared memory we know about (eg IPC) and
>    calculate it once when calculating MEMORY_TOTAL.
>
>  * We could do nothing, MEMORY_TOTAL was inaccurate before and the world
>    didn't end.  Maybe it's better now because you read this e-mail and now
>    *know* that it's inaccurate and won't make false assumptions.
>
>  * We could remove this telemetry to avoid it confusing anyone.
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> [email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/20220926060343.GA10084%40aluminium
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CAFhp-qcFPP-A%3DU3koFeNHnuh%3DgiN%2BFArZ5Zp8_n7f_MpYb-N6A%40mail.gmail.com.

Reply via email to