[clang] [clang] [diagnostics] Stable IDs for Clang diagnostics (PR #168153)

Dave Bartolomeo via cfe-commits Sat, 22 Nov 2025 17:54:41 -0800

dbartol wrote:

@AaronBallman Your idea of having a tool to add a diagnostic got me thinking. 
Please bear with me...


I'll start with a couple of claims:

### Warnings only
We should only really care about stable IDs for warnings, not 
errors/fatals/notes. While SARIF is capable of representing errors, I don't 
expect anyone to try to track instances of errors over time. Either your code 
compiles successfully or it doesn't. If it has errors, you have to fix them. 
The only reason why I think we should emit errors to the SARIF log at all is 
that having both errors and warnings in the same file makes it easier to use 
SARIF as a common structured diagnostic format to drive IDE support, CI 
reports, etc. So, while SARIF requires us to emit a `ruleId` property for each 
result, whether warning or error, I don't think we should make any effort to 
make the _error_ IDs stable for now. 

We seem to use notes only to annotate other warnings/errors. In SARIF, those 
are supposed to show up as `relatedLocations` on the root warning/error, rather 
than as their own `result` object. Thus, notes don't need SARIF IDs at all, 
stable or otherwise.

### Focus on stability and evolution
The SARIF ID doesn't need to meaningful; it just needs to be unique and stable. 
So, instead of focusing on how to choose a stable ID for a warning, we should 
focus on how to ensure that the ID _remains_ stable.  We should make creation 
of a stable ID trivial, but then detect when new warnings are added or removed, 
and ensure that the developer has taken any necessary steps to preserve the 
stable IDs or to record how the stable IDs have evolved.

## Proposal
For errors, fatals, and notes, we use the in-source identifier of the 
diagnostic as its SARIF ID. We do not expect this to be stable, but we do not 
expect SARIF tools to care about the stability of error IDs over time.

For warnings, we use the in-source identifier of the diagnostic as its SARIF ID 
by default. We also allow two additional properties: `StableId<id>` and 
`OldStableIds<[id, ...]`. `StableId<id>` specifies _id_ as the stable ID of the 
diagnostic, instead of the default. `OldStableIds<[id, ...]>` adds the listed 
IDs to the `deprecatedIds` property of the diagnostic's SARIF rule metadata.

The effect on common contributor scenarios is as follows:

- **Adding a new diagnostic (either warning or non-warning):** Same as today. 
Just add it, and we'll use the in-source identifier as its stable ID.
- **Deleting an existing diagnostic (either warning or non-warning):** Same as 
today. Just delete it.
- **Modifying a property of an existing warning _other than its in-source 
identifier_:** Same as today.
- **Changing the in-source identifier of a warning:** Change the in-source 
identifier, but need to either explicitly specify `StableId<original-name>` (to 
keep the original stable ID) or `OldStableIds<[original-name]>` (to use the new 
in-source identifier as the stable ID while preserving history).
- **Splitting a warning into multiple warnings:** Create the new warnings as 
today, but add `OldStableIds<[original-name]>` to each of them to preserve 
history.
- **Combining multiple warnings into a single warning:** As today, but add 
`OldStableIds<[original-name-1, original-name-2...]>` to the combined warning 
to preserve history.

## Enforcement
It should be easy enough to create a CI job that compares the set of 
diagnostics between the base commit of a PR and the head commit of that PR, 
and, if anything suspicious changes have occurred, posts instructions for how 
to deal with renaming/merging/splitting warnings.

To give you an idea of how often this would trigger, I wrote a simple script to 
crawl our Git history, run `llvm-tblgen Diagnostic.td -dump-json` on each 
commit and its parent, and identify whether that commit added a new warning, 
deleted an existing warning, both, or neither. Crawling back from current 
`main` to the beginning of 2024, there were:

- 96 commits that added one or more new warnings without deleting any existing 
warnings. These could be splitting an existing warning, but are usually just 
new warnings.
- 15 commits that deleted one or more existing warnings without adding any new 
warnings. These could be merging existing warnings into one, but are usually 
just deleting warnings.
- 13 commits that both added and deleted warnings. These often appear to be 
doing something complex involving renaming, merging, or splitting.
- 240 commits that modified one of the `Diagnostic*.td` files without adding or 
deleting any warnings.

(details 
[here](https://gist.github.com/dbartol/e0ebbeed741bf2ef46cad6dcb7477072))

So, the bot would fire ~5 times a month, and actually provoke additional work 
from the contributor maybe once every month or two. Other than that, the whole 
diagnostic process would remain the same as today, which still seems like less 
work than managing GUIDs for every diagnostic.



https://github.com/llvm/llvm-project/pull/168153
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] [diagnostics] Stable IDs for Clang diagnostics (PR #168153)

Reply via email to