[clang] [clang] [diagnostics] Stable IDs for Clang diagnostics (PR #168153)

Aaron Ballman via cfe-commits Fri, 21 Nov 2025 06:00:52 -0800

AaronBallman wrote:

> @AaronBallman
> 
> > should we auto-generate a UUID...
> 
> To persist across builds, the IDs would either have to be in the `.td` files, 
> or they'd need to be generated by some hash of stable inputs. I don't think 
> we have any stable inputs to hash - the message, internal ID, and integer 
> value could all change.


Yeah, and we don't have a way to persist information across builds that I'm 
aware of.

> The SARIF stable ID doesn't have to be meaningful. If we went with UUIDs, 
> someone adding a new diagnostic could just generate one easily enough. We'd 
> just need a check somewhere to prevent duplicates in case of copy-paste 
> mistakes.
> 
> We could also just use explicitly-assigned, dense-ish stable integer IDs, 
> MSVC-style. Those are easier to use in both written and oral conversation 
> than UUIDs, but they might require slightly more process. We'd do an initial 
> pass to add them to the `.td` file, just like for UUIDs, and add the same 
> check for duplicates. Adding a new diagnostic would require picking an unused 
> integer ID. We'd probably have to keep a list of "retired" IDs in the code, 
> though, to prevent accidental reuse of a no-longer-generated diagnostic's ID 
> for a new diagnostic. The main problem with this approach would be dealing 
> with forks and merge conflicts. If two forks both add warning 1234 and then 
> try to push upstream, we'd have to have a rule like "first one into 
> `llvm-project/main` wins, everyone else has to renumber".

Yeah, that's why I didn't think we'd want to use dense integers. With UUIDs, 
downstreams have far less risk of conflicts.

> I suppose we could split the difference and use one of those code phrase 
> generators to make diagnostic IDs. Easier to use in conversation than UUIDs, 
> but less likely to collide than manually-assigned dense integers. The 
> downside is that at some point, we'll find ourselves writing something like 
> "We're getting too many false positive reports for purple-monkey-dishwasher, 
> so we'll need to tune the heuristcs."

Yeah, that does split the middle, but seems like a nightmare for people adding 
new diagnostics because they have to figure out a code phrase for each one, and 
I'm a bit worried about new contributor experience from that too.

> Does any one of the above approaches seem workable?

Maybe, but what about this as an idea: we have a script that generates a UUID 
and gives it a `constexpr` name that the script automatically adds to a 
standalone .td file. When you want to add a new diagnostic, you have to run the 
script and it tells you "your diagnostic's ID is <whatever the constexpr name 
is we generate>" and that gets added in the `Diagnostic*Kind.td` file with 
something like `StableID<Whatever>`. We could then decide that a `StableID` 
marking can go either on an individual diagnostic OR on a diagnostic group 
level; if on the diagnostic group level, then all diagnostics in the group are 
assumed to be "the same diagnostic", if we wanted that kind of control. The 
script would have to have some sort of reasonable naming scheme for the ID 
variables, but because those aren't user-facing, they can be kind of ugly but 
still better than a UUID; e.g., `StableID<Sema12344>` or something. We can help 
downstreams out by the script having a knob that downstreams can use to set 
their own starting point for IDs to help reduce conflicts. And the diagnostics 
tablegenerator can give an error if someone adds a diagnostic which doesn't 
have an ID (perhaps inherited from the group) or duplicates a UUID, to help 
catch mistakes.

What do you think of something along those lines? It's a bit heavier of a 
process to add a diagnostic than it is today, but we could actually make this a 
nicer script that does more than just SARIF IDs. e.g., maybe this script 
generates the full diagnostic for you. Like:
```
> gen-diag -part=sema -warning -default-error -name=do_not_do_bad_thing 
> -msg="this is the diagnostic message, isn't it %select{great|terrible}0?" 
> -group=bad-thing

def warn_do_not_do_bad_thing : Warning<
  "this is the diagnostic message, isn't it %select{great|terrible}0">,
  InGroup<DiagGroup<"bad-thing">>, DefaultError, StableID<Sema1234>;
```
(the tool could even automatically add the diagnostic text to the correct .td 
files directly instead of making people copy and paste it, or have a 
`--dry-run` option to show you what it would add to the file, etc.)

Maybe I'm scope-creeping too. :-D

https://github.com/llvm/llvm-project/pull/168153
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] [diagnostics] Stable IDs for Clang diagnostics (PR #168153)

Reply via email to