PUBLIC

Yes I was, since I am using 9.12.1.

This is great stuff -- with !13593 backported, I don’t see any 
ModuleGraph-related stuff showing up in top spots of the heap profile! Thank 
you for pointing me towards this!

I’ll have to do some more digging to see if there is anything else concretely 
blameable now. But in theory, morally, abstractly, from the 10,000 feet view, 
would it be correct to say that there should be no memory usage difference 
between typechecking a large number of modules from scratch vs. loading the 
`.hi` files for those same modules? (Modulo, of course, the memory usage during 
any single module’s typechecking).

From: Matthew Pickering <matthewtpicker...@gmail.com>
Sent: Wednesday, April 2, 2025 6:03 PM
To: Erdi, Gergo <gergo.e...@sc.com>
Cc: GHC Devs <ghc-devs@haskell.org>; ÉRDI Gergő <ge...@erdi.hu>; Montelatici, 
Raphael Laurent <raphael.montelat...@sc.com>; Dijkstra, Atze 
<atze.dijks...@sc.com>
Subject: [External] Re: GHC memory usage when typechecking from source vs. 
loading ModIfaces

I think you are missing 
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13593<https://urldefense.com/v3/__https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13593__;!!ASp95G87aa5DoyK5mB3l!736R3M2loLHwk60BLtBarIhVc0mTMNZ41vyzyjTmqQJ81DibLFqtrJNUeZLxM8YOIGmg8SbzoAYXhGK97EyBWDm9$>

On HEAD I get maximum residency of about 200M, NodeKey usage is constant.

On 9.10.1, I get maximum residency of 400M, NodeKey usage looks quadratic.



On Wed, Apr 2, 2025 at 10:47 AM Erdi, Gergo 
<gergo.e...@sc.com<mailto:gergo.e...@sc.com>> wrote:

PUBLIC

zcat ../repro-hs.patch.gz |patch -p0

From: Matthew Pickering 
<matthewtpicker...@gmail.com<mailto:matthewtpicker...@gmail.com>>
Sent: Wednesday, April 2, 2025 5:39 PM
To: Erdi, Gergo <gergo.e...@sc.com<mailto:gergo.e...@sc.com>>
Cc: GHC Devs <ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>; ÉRDI Gergő 
<ge...@erdi.hu<mailto:ge...@erdi.hu>>; Montelatici, Raphael Laurent 
<raphael.montelat...@sc.com<mailto:raphael.montelat...@sc.com>>; Dijkstra, Atze 
<atze.dijks...@sc.com<mailto:atze.dijks...@sc.com>>
Subject: [External] Re: GHC memory usage when typechecking from source vs. 
loading ModIfaces


What command do I run to generate the files from this patch file? Perhaps a 
link to a git repo would be a suitable way to share the reproducer?

On Wed, Apr 2, 2025 at 10:26 AM Erdi, Gergo 
<gergo.e...@sc.com<mailto:gergo.e...@sc.com>> wrote:

PUBLIC

Hi Matt,

I think I have something that might demonstrate that GHC (at least GHC 9.12.1) 
might have a similar problem!

With the attached vacuous module hierarchy, I tried compiling M2294 from 
scratch, and then with `.hi` files for everything except the toplevel module. I 
did the same with our GHC-API-using compiler as well. As you can see from the 
attached event logs, while the details differ, the overall shape of the memory 
used by ModuleGraph edges (750k of GWIB and NodeKey_Module constructors for the 
2321 ModuleNodes and ~60k direct dependency edges) is pretty much the same 
between our compiler and GHC 9.12, suggesting to me that GHC is duplicating 
ModuleGraph node information in the dependency edges when building the 
transitive closure.

Based on these measurements, do you agree that this is a GHC-side problem of 
memory usage scaling quadratically with the number of dependency edges?

Thanks,
            Gergo

p.s.: Sorry for including the reproducer module tree in this weird format as a 
patch file, but I am behind a mail server that won’t let me send mails with too 
many individual files in attached archives…

From: Matthew Pickering 
<matthewtpicker...@gmail.com<mailto:matthewtpicker...@gmail.com>>
Sent: Friday, March 28, 2025 8:40 PM
To: Erdi, Gergo <gergo.e...@sc.com<mailto:gergo.e...@sc.com>>
Cc: GHC Devs <ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>; ÉRDI Gergő 
<ge...@erdi.hu<mailto:ge...@erdi.hu>>; Montelatici, Raphael Laurent 
<raphael.montelat...@sc.com<mailto:raphael.montelat...@sc.com>>; Dijkstra, Atze 
<atze.dijks...@sc.com<mailto:atze.dijks...@sc.com>>
Subject: [External] Re: GHC memory usage when typechecking from source vs. 
loading ModIfaces

HI Gergo,

Do you have a (synthetic?) reproducer? You have probably identified some memory 
leak. However, without any means to reproduce it becomes very difficult to 
investigate. I feel like we are getting into very precise details now, where 
speculating is not going to be so useful.

It seems like this is an important thing for you and your company. Is there any 
budget to pay for some investigation? If that was the case then some effort 
could be made to create a synthetic producer and make the situation more robust 
going into the future if your requirements were precisely understood.

Cheers,

Matt

On Fri, Mar 28, 2025 at 10:12 AM Erdi, Gergo 
<gergo.e...@sc.com<mailto:gergo.e...@sc.com>> wrote:
PUBLIC

Just to add that I get the same "equalizing" behaviour (but in a more "natural" 
way) if instead of deepseq-ing the ModuleGraph upfront, I just call 
`hugInstancesBelow` before processing each module. So that's definitely one 
source of extra memory usage. I wonder if it would be possible to rebuild the 
ModuleGraph periodically (similar to the ModDetails dehydration), or if there 
are references to it stored all over the place from `HscEnv`s scattered around 
in closures etc. (basically the same problem the HPT had before it was made 
into a mutable reference).

-----Original Message-----
From: ghc-devs 
<ghc-devs-boun...@haskell.org<mailto:ghc-devs-boun...@haskell.org>> On Behalf 
Of Erdi, Gergo via ghc-devs
Sent: Friday, March 28, 2025 4:49 PM
To: Matthew Pickering 
<matthewtpicker...@gmail.com<mailto:matthewtpicker...@gmail.com>>; GHC Devs 
<ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>
Cc: ÉRDI Gergő <ge...@erdi.hu<mailto:ge...@erdi.hu>>; Montelatici, Raphael 
Laurent <raphael.montelat...@sc.com<mailto:raphael.montelat...@sc.com>>; 
Dijkstra, Atze <atze.dijks...@sc.com<mailto:atze.dijks...@sc.com>>
Subject: [External] Re: GHC memory usage when typechecking from source vs. 
loading ModIfaces

Hi,

Unfortunately, I am forced to return to this problem. Everything below is now 
in the context of GHC 9.12 plus the mutable HPT patch backported.

My test case is typechecking a tree of 2294 modules that form the transitive 
closure of a single module's dependencies, all in a single process. I have done 
this typechecking three times, here's what `+RTS -s -RTS` reports for max 
residency:

* "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB

* "cold-top": With all `ModIface`s already on disk, except for the
  single top-level module: 302 MB

* "warm": With all `ModIface`s already on disk: 211 MB

So my stupidly naive question is, why is the "cold" case also not 302 MB?

In earlier discussion, `ModDetails` unfolding has come up. Dehydrating 
`ModDetails` in the HPT all the time is disastrous for runtime, but based on 
this model I would expect to see improvements from dehydrating "every now and 
then". So I tried a stupid simple example where after every 100th typechecked 
module, I run this function on the topologically sorted list of modules 
processed so far:


```
dehydrateHpt :: HscEnv -> [ModuleName] -> IO () dehydrateHpt hsc_env mods = do
    let HPT{ table = hptr } = hsc_HPT hsc_env
    hpt <- readIORef hptr
    for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface _details 
_linkable) -> do
        !details <- initModDetails hsc_env iface
        pure ()
```

Buuut the max residency is still 534 MB (see "cold-dehydrate"); in fact, the 
profile looks exactly the same.

Speaking of the profile, in the "cold" case I see a lot of steadily increasing 
heap usage from the `ModuleGraph`. I could see this happening if typechecking 
from scratch involves more `modulesUnder` calls which in turn force more and 
more of the `ModuleGraph`. If so, then maybe this could be worked around by 
repeatedly remaking the `ModuleGraph` just like I remake the `ModDetails` 
above. I tried getting rid of this effect by `deepseq`'ing the `ModuleGraph` at 
the start, with the idea being that this should "equalize" the three scenarios 
if this really is a substantial source of extra memory usage. This pushes up 
the warm case's memory usage to 381 MB, which is promising, but I still see a 
`Word64Map` that is steadily increasing in the "cold-force-modulegraph" case 
and contributes a lot to the memory usage. Unfortunately, I don't know where 
that `Word64Map` is (it could be any `Unique`-keyed environment...).

So I am now stuck at this point. To spell out my goal explicitly, I would like 
to typecheck one module after another and not keep anything more in memory 
around than if I loaded them from `ModIface` files.

Thanks,
        Gergo

p.s.: I couldn't find a way in the EventLog output HTML to turn event markers 
on/off or filter them, so to avoid covering the whole graph with gray lines, I 
mark only every 100th module.




From: Matthew Pickering 
<matthewtpicker...@gmail.com<mailto:matthewtpicker...@gmail.com>>
Sent: Wednesday, February 12, 2025 7:08 PM
To: ÉRDI Gergő <ge...@erdi.hu<mailto:ge...@erdi.hu>>
Cc: Erdi, Gergo <gergo.e...@sc.com<mailto:gergo.e...@sc.com>>; Zubin Duggal 
<zu...@well-typed.com<mailto:zu...@well-typed.com>>; Montelatici, Raphael 
Laurent <raphael.montelat...@sc.com<mailto:raphael.montelat...@sc.com>>; GHC 
Devs <ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>
Subject: [External] Re: GHC memory usage when typechecking from source vs. 
loading ModIfaces

You do also raise a good point about rehydration costs.

In oneshot mode, you are basically rehydrating the entire transitive closure of 
each module when you compile it, which obviously results in a large amount of 
repeated work. This is why people are investigating ideas of a persistent 
worker to at least avoid rehydrating all external dependencies as well.

On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering 
<mailto:matthewtpicker...@gmail.com<mailto:matthewtpicker...@gmail.com>> wrote:
Sure, you can remove them once you are sure they are not used anymore.

For clients like `GHCi` that doesn't work obviously as they can be used at any 
point in the future but for a batch compiler it would be fine.

On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő 
<mailto:ge...@erdi.hu<mailto:ge...@erdi.hu>> wrote:
On Mon, 10 Feb 2025, Matthew Pickering wrote:

> I wonder if you have got your condition the wrong way around.
>
> The only "safe" time to perform rehydration is AFTER the point it can
> never be used again.
>
> If you rehydrate it just before it is used then you will repeat work
> which has already been done. If you do this, you will always have a
> trade-off between space used and runtime.

Oops. Yes, I have misunderstood the idea. I thought the idea was that after 
loading a given module into the HPT, its ModDetails would start out small 
(because of laziness) and then keep growing in size as more and more of it are 
traversed, and thus forced, during the typechecking of its dependees, so at 
some point we would want to reset that into the small initial representation as 
created by initModDetails.

But if the idea is that I should rehydrate modules when they can't be used 
anymore, then that brings up the question why even do that, instead of straight 
removing the HomeModInfos from the HPT?

----------------------------------------------------------------------
This email and any attachments are confidential and may also be privileged. If 
you are not the intended recipient, please delete all copies and notify the 
sender immediately. You may wish to refer to the incorporation details of 
Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together 
with Standard Chartered Bank’s Privacy Policy via our public website.

----------------------------------------------------------------------
This email and any attachments are confidential and may also be privileged. If 
you are not the intended recipient, please delete all copies and notify the 
sender immediately. You may wish to refer to the incorporation details of 
Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together 
with Standard Chartered Bank’s Privacy Policy via our main Standard Chartered 
PLC (UK) website at sc. com

----------------------------------------------------------------------
This email and any attachments are confidential and may also be privileged. If 
you are not the intended recipient, please delete all copies and notify the 
sender immediately. You may wish to refer to the incorporation details of 
Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together 
with Standard Chartered Bank’s Privacy Policy via our main Standard Chartered 
PLC (UK) website at sc. com
________________________________
This email and any attachments are confidential and may also be privileged. If 
you are not the intended recipient, please delete all copies and notify the 
sender immediately. You may wish to refer to the incorporation details of 
Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together 
with Standard Chartered Bank’s Privacy Policy via our main Standard Chartered 
PLC (UK) website at sc. com
________________________________
This email and any attachments are confidential and may also be privileged. If 
you are not the intended recipient, please delete all copies and notify the 
sender immediately. You may wish to refer to the incorporation details of 
Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together 
with Standard Chartered Bank’s Privacy Policy via our main Standard Chartered 
PLC (UK) website at sc. com

----------------------------------------------------------------------
This email and any attachments are confidential and may also be privileged. If 
you are not the intended recipient, please delete all copies and notify the 
sender immediately. You may wish to refer to the incorporation details of 
Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together 
with Standard Chartered Bank’s Privacy Policy via our main Standard Chartered 
PLC (UK) website at sc. com
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to