Re: Proposal: Roundtrip serialization of Cmm (parser-compatible pretty-printer output)

Hécate via ghc-devs Mon, 28 Jul 2025 09:04:18 -0700

Hi Diego,

Thank you very much for your work in this direction, it's sorely needed.

I'm all for having proper roundtrip correctness for Cmm, but I am notsure altering the parser is the way to go.In my opinion, GHC should produce valid textual Cmm, that can beingested by the parser at it is today.


Have a nice day,
Hécate

Le 28/07/2025 à 02:16, Diego Antonio Rosario Palomino a écrit :

Hello GHC devs,
I'm currently working on Cmm documentation and tooling improvements aspart of my Google Summer of Code project. One of my core goals is tomake Cmm roundtrip serializable.
Right now, the in-memory Cmm data structure—generated programmatically(e.g., from STG via GHC)—can be pretty-printed, and Cmm can also beparsed. However, the pretty-printed version is not compatible with theparser. That is, we cannot take the output of the pretty printer andfeed it directly back into the parser.
Example:

Parseable version:

|sum { cr: bits64 x; x = R1 + R2; R1 = x; jump %ENTRY_CODE(Sp(0))[R1]; } |

Pretty-printed version:
|sum() { // [] { info_tbls: [] stack_info: arg_space: 8 } {offset cf:// global _ce::I64 = R1 + R2; R1 = _ce::I64; call (I64[Sp + 0 *8])(R1) args: 8, res: 0, upd: 8; } } |
Another example:

Parseable version:
|simple_sum_4 { // [R2, R1] cr: // global bits64 _cq; _cq = R2; bits64_cp; _cp = R1; R1 = _cq + _cp; jump (bits64[Sp])[R1]; } |
Pretty-printed version:
|simple_sum_4() { // [] { info_tbls: [] stack_info: arg_space: 8 }{offset cs: // global _cq::I64 = R2; _cr::I64 = R1; R1 = _cq::I64 +_cr::I64; call (I64[Sp])(R1) args: 8, res: 0, upd: 8; } } |
While it’s possible to write parseable Cmm that resembles thepretty-printed version (and hence the internal ADT), they don’t fullymatch—mainly because the parser inserts inferred fields usingconvenience functions.
Proposal:
To make roundtrip serialization possible, I propose supporting a newsyntax that matches the pretty printer output exactly.
There are a couple of design options:

1.

    Create a separate parser that accepts the pretty-printed syntax.
    Files could then use either the current parser or the new strict one.

2.

    Extend the current parser with a dedicated block syntax like:

|low_level_unwrapped { ... } |
This second option is the one my mentor recommends, as it may betterreflect GHC developers' preferences. In this mode, the parser wouldnot insert any inferred data and would expect the input to match thepretty-printed form exactly.
This would enable a true roundtrip:

 *

    Compile Haskell to Cmm (in-memory AST)

 *

    Pretty-print and write it to disk (wrapped in low_level_unwrapped
    { ... })

 *

    Later read it back using the parser and continue with codegen

Optional future direction:
As a side note: currently the parser has both a “high-level” and a“low-level” mode. The low-level mode resembles the AST more closelybut still inserts some inferred data.
If we introduce this new “exact” low-level form, it's possible theexisting low-level mode could become redundant. We might then have:
 *

    High-level syntax

 *

    New low-level (exact)

 *

    And possibly deprecate the current low-level variant

I’d be interested in your thoughts on whether that direction makes sense.

Serialization libraries?
One technically possible—but likely unacceptable—alternative would beto derive serialization via a library like |aeson|. That would enableserializing and deserializing the Cmm AST directly. However, Iunderstand that |aeson| adds a large dependency footprint, and likelywouldn't be suitable for inclusion in GHC.
Final question:
Lastly—I’ve heard that parts of the Cmm pipeline may currently beunder refactoring. If that’s the case, could you point me to whichparts (parser, pretty printer, internal representation, etc.) arebeing modified? I’d like to align my efforts accordingly and avoidconflicts.
Thanks very much for your time and input! I'm happy to iterate on thisbased on your feedback.
Best regards,
Diego Antonio Rosario Palomino
GSoC 2025 – Cmm Documentation & Tooling


_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


--
Hécate ✨
🐦: @TechnoEmpress
IRC: Hecate
WWW:https://glitchbra.in
RUN: BSD

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Proposal: Roundtrip serialization of Cmm (parser-compatible pretty-printer output)

Reply via email to