dbartol wrote: @AaronBallman Your idea of having a tool to add a diagnostic got me thinking. Please bear with me...
I'll start with a couple of claims: ### Warnings only We should only really care about stable IDs for warnings, not errors/fatals/notes. While SARIF is capable of representing errors, I don't expect anyone to try to track instances of errors over time. Either your code compiles successfully or it doesn't. If it has errors, you have to fix them. The only reason why I think we should emit errors to the SARIF log at all is that having both errors and warnings in the same file makes it easier to use SARIF as a common structured diagnostic format to drive IDE support, CI reports, etc. So, while SARIF requires us to emit a `ruleId` property for each result, whether warning or error, I don't think we should make any effort to make the _error_ IDs stable for now. We seem to use notes only to annotate other warnings/errors. In SARIF, those are supposed to show up as `relatedLocations` on the root warning/error, rather than as their own `result` object. Thus, notes don't need SARIF IDs at all, stable or otherwise. ### Focus on stability and evolution The SARIF ID doesn't need to meaningful; it just needs to be unique and stable. So, instead of focusing on how to choose a stable ID for a warning, we should focus on how to ensure that the ID _remains_ stable. We should make creation of a stable ID trivial, but then detect when new warnings are added or removed, and ensure that the developer has taken any necessary steps to preserve the stable IDs or to record how the stable IDs have evolved. ## Proposal For errors, fatals, and notes, we use the in-source identifier of the diagnostic as its SARIF ID. We do not expect this to be stable, but we do not expect SARIF tools to care about the stability of error IDs over time. For warnings, we use the in-source identifier of the diagnostic as its SARIF ID by default. We also allow two additional properties: `StableId<id>` and `OldStableIds<[id, ...]`. `StableId<id>` specifies _id_ as the stable ID of the diagnostic, instead of the default. `OldStableIds<[id, ...]>` adds the listed IDs to the `deprecatedIds` property of the diagnostic's SARIF rule metadata. The effect on common contributor scenarios is as follows: - **Adding a new diagnostic (either warning or non-warning):** Same as today. Just add it, and we'll use the in-source identifier as its stable ID. - **Deleting an existing diagnostic (either warning or non-warning):** Same as today. Just delete it. - **Modifying a property of an existing warning _other than its in-source identifier_:** Same as today. - **Changing the in-source identifier of a warning:** Change the in-source identifier, but need to either explicitly specify `StableId<original-name>` (to keep the original stable ID) or `OldStableIds<[original-name]>` (to use the new in-source identifier as the stable ID while preserving history). - **Splitting a warning into multiple warnings:** Create the new warnings as today, but add `OldStableIds<[original-name]>` to each of them to preserve history. - **Combining multiple warnings into a single warning:** As today, but add `OldStableIds<[original-name-1, original-name-2...]>` to the combined warning to preserve history. ## Enforcement It should be easy enough to create a CI job that compares the set of diagnostics between the base commit of a PR and the head commit of that PR, and, if anything suspicious changes have occurred, posts instructions for how to deal with renaming/merging/splitting warnings. To give you an idea of how often this would trigger, I wrote a simple script to crawl our Git history, run `llvm-tblgen Diagnostic.td -dump-json` on each commit and its parent, and identify whether that commit added a new warning, deleted an existing warning, both, or neither. Crawling back from current `main` to the beginning of 2024, there were: - 96 commits that added one or more new warnings without deleting any existing warnings. These could be splitting an existing warning, but are usually just new warnings. - 15 commits that deleted one or more existing warnings without adding any new warnings. These could be merging existing warnings into one, but are usually just deleting warnings. - 13 commits that both added and deleted warnings. These often appear to be doing something complex involving renaming, merging, or splitting. - 240 commits that modified one of the `Diagnostic*.td` files without adding or deleting any warnings. (details [here](https://gist.github.com/dbartol/e0ebbeed741bf2ef46cad6dcb7477072)) So, the bot would fire ~5 times a month, and actually provoke additional work from the contributor maybe once every month or two. Other than that, the whole diagnostic process would remain the same as today, which still seems like less work than managing GUIDs for every diagnostic. https://github.com/llvm/llvm-project/pull/168153 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
