RE: Introducing GHC whole program compiler (GHC-WPC)

Simon Peyton Jones via ghc-devs Mon, 15 Jun 2020 02:36:03 -0700

I’ve always thought that whole-program compilation has the possibility of doing 
optimisations that are simply inaccessible without the whole program, but been 
daunted by the engineering challenges of making WPC actually work.  So it’s 
fantastic that you’ve made progress on this.  Well done!


Questions that come to mind (to be understood in the context of the above 
enthusiasm):

  *   If you compile a program that depends on (say) lens, you get a lot of 
code.  Dead-code elim will drop lots, perhaps, but you start with everything.  
So what do memory footprints and compile times look like when you do WPC?   
Remembering that people often complain about GHC’s footprint when compiling a 
single module.

Also, WPC means that instead of just linking to precompiled libraries, you have 
to recompile (parts of) them.  What does that do to compile times?


  *   I love the 25% reduction in binary size. In fact I’m surprised it isn’t 
bigger.



  *   Why are you using STG?  It’s mostly untyped – or at least much less 
strongly typed than Core.  It has lots of restrictions (like ANF) that Core 
does not.   Indeed I think of STG as a little lily-pad to alight on in the hop 
from Core to Cmm.   Maybe your entire setup would work equally well with Core, 
provided you can serialise and deserialise it.


  *   Moreover, we *already* have a fast serialiser and deserialiser for Core – 
the stuff we use for interface files.  So maybe you could re-use that … no need 
for pretty-print and parse.



  *   You say “That would mean a significant conceptual shift in the GHC 
compiler pipeline, because heavy optimizations would be introduced at the low 
level IRs beside GHC Core.”  Fair enough, but what I’m missing is the rationale 
for doing heavy opts on STG rather than Core.



  *   Apart from (a) dead code, and (b) GRIN, do you have ideas in mind for 
what we could do with WPC?


Thanks

Simon




From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Csaba Hruska
Sent: 14 June 2020 13:46
To: Alexis King <lexi.lam...@gmail.com>
Cc: GHC developers <ghc-devs@haskell.org>
Subject: Re: Introducing GHC whole program compiler (GHC-WPC)

Hi,

I thought about the GHC-LTO project name before, but it would not be an 
accurate description though. The GHC-WPC in its current state is about 
exporting STG + linker info for later processing, either feed it back to GHC 
backend or to a third party pipeline. It depends what the user/researcher 
wants, the point is that GHC-WPC solves the IR export part of the issue. It is 
the external stg compiler that implements a (simple) whole program dead 
function elimination pass that I implemented as a proof of concept to show the 
new possibilities GHC-WPC opens up. But I plan to do much more optimization 
with sophisticated dataflow analyses. I.e. I have a fast and working 
implementation of control flow analysis in souffle/datalog that I plan to use 
to do more accurate dead code elimination and partial program 
defunctionalization on the whole program STG IR. In theory I could implement 
all GRIN optimizations on STG. That would mean a significant conceptual shift 
in the GHC compiler pipeline, because heavy optimizations would be introduced 
at the low level IRs beside GHC Core. I'd like to go even further with 
experimentation. I can imagine a dependently typed Cmm with a similar type 
system that ATS 
(http://www.ats-lang.org/MYDATA/VsTsVTs-2018-10-28.pdf<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ats-lang.org%2FMYDATA%2FVsTsVTs-2018-10-28.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7Cd49efc7bddcb4e35d70808d81060fbbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637277356582121415&sdata=lUgtWXs9etLZwQYbGDMWTt4Ea17YhsQLJpxRbN5iJSE%3D&reserved=0>)
 has. I definitely would like to make an experiment in the future, to come up 
with an Idirs2 EDSL for GHC RTS heap operations where the type system would 
ensure the correctness of pointer arithmetic and heap object manipulation. The 
purpose of GHC-WPC in this story is to deliver the IR for these stuff.

Beside exporting STG IR, the external STG compiler can compile STG via GHC's 
standard code generator. This makes GHC codegen/RTS available as a backend for 
programming language developers. I.e. Idris, Agda, Purescript could use 
GHC/STG/RTS as a backend with all of its cool features.

So these are the key parts of my vision about the purpose and development of 
GHC-WPC. It is meant to be more than a link time optimizer.

Cheers,
Csaba

On Sat, Jun 13, 2020 at 10:26 PM Alexis King 
<lexi.lam...@gmail.com<mailto:lexi.lam...@gmail.com>> wrote:
Hi Csaba,

I originally posted this comment on 
/r/haskell<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.reddit.com%2Fr%2Fhaskell%2Fcomments%2Fh7t8wr%2Fintroducing_ghc_whole_program_compiler_ghcwpc%2Ffuqdnye%2F&data=02%7C01%7Csimonpj%40microsoft.com%7Cd49efc7bddcb4e35d70808d81060fbbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637277356582121415&sdata=2NcIa7wPP%2Bcwu%2B8%2FoctSe8pj%2Fsl9O5BGbNIbsxTfj5U%3D&reserved=0>
 before I saw you also sent this to ghc-devs. I’ve decided to reproduce my 
comment here as well, since this list probably has a more relevant audience:

I want to start by saying that I think this sounds totally awesome, and I think 
it’s a fantastic idea. I’m really interested in seeing how this progresses!

I do wonder if people might find the name a little misleading. “Whole program 
compilation” usually implies “whole program optimization,” but most of GHC’s 
key optimizations happen at the Core level, before STG is even generated. (Of 
course, I’m sure you’re well aware of that, I’m just stating it for the sake of 
others who might be reading who aren’t aware.)

This seems much closer in spirit to “link-time optimization” (LTO) as performed 
by Clang and GCC than whole program compilation. For example, Clang’s LTO works 
by “linking” LLVM bitcode files instead of fully-compiled native objects. STG 
is not quite analogous to LLVM IR—GHC’s analog would be Cmm, not STG—but I 
think that difference is not that significant here: the STG-to-Cmm pass is 
quite mechanical, and STG is mostly just easier to manipulate.

tl;dr: Have you considered naming this project GHC-LTO instead of GHC-WPC?

Alexis


On Jun 12, 2020, at 16:16, Csaba Hruska 
<csaba.hru...@gmail.com<mailto:csaba.hru...@gmail.com>> wrote:

Hello,

I've created a whole program compilation pipeline for GHC via STG.
Please read my blog post for the details:
Introducing GHC whole program compiler 
(GHC-WPC)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.patreon.com%2Fposts%2Fintroducing-ghc-38173710&data=02%7C01%7Csimonpj%40microsoft.com%7Cd49efc7bddcb4e35d70808d81060fbbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637277356582131411&sdata=%2BF1KN0%2BuZbeW%2F3wTFOCvNVN9UWxY8wPhcahnN7Tx7DQ%3D&reserved=0>

Regards,
Csaba Hruska

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: Introducing GHC whole program compiler (GHC-WPC)

Reply via email to