AaronBallman wrote:
> @AaronBallman
>
> > should we auto-generate a UUID...
>
> To persist across builds, the IDs would either have to be in the `.td` files,
> or they'd need to be generated by some hash of stable inputs. I don't think
> we have any stable inputs to hash - the message, internal ID, and integer
> value could all change.
Yeah, and we don't have a way to persist information across builds that I'm
aware of.
> The SARIF stable ID doesn't have to be meaningful. If we went with UUIDs,
> someone adding a new diagnostic could just generate one easily enough. We'd
> just need a check somewhere to prevent duplicates in case of copy-paste
> mistakes.
>
> We could also just use explicitly-assigned, dense-ish stable integer IDs,
> MSVC-style. Those are easier to use in both written and oral conversation
> than UUIDs, but they might require slightly more process. We'd do an initial
> pass to add them to the `.td` file, just like for UUIDs, and add the same
> check for duplicates. Adding a new diagnostic would require picking an unused
> integer ID. We'd probably have to keep a list of "retired" IDs in the code,
> though, to prevent accidental reuse of a no-longer-generated diagnostic's ID
> for a new diagnostic. The main problem with this approach would be dealing
> with forks and merge conflicts. If two forks both add warning 1234 and then
> try to push upstream, we'd have to have a rule like "first one into
> `llvm-project/main` wins, everyone else has to renumber".
Yeah, that's why I didn't think we'd want to use dense integers. With UUIDs,
downstreams have far less risk of conflicts.
> I suppose we could split the difference and use one of those code phrase
> generators to make diagnostic IDs. Easier to use in conversation than UUIDs,
> but less likely to collide than manually-assigned dense integers. The
> downside is that at some point, we'll find ourselves writing something like
> "We're getting too many false positive reports for purple-monkey-dishwasher,
> so we'll need to tune the heuristcs."
Yeah, that does split the middle, but seems like a nightmare for people adding
new diagnostics because they have to figure out a code phrase for each one, and
I'm a bit worried about new contributor experience from that too.
> Does any one of the above approaches seem workable?
Maybe, but what about this as an idea: we have a script that generates a UUID
and gives it a `constexpr` name that the script automatically adds to a
standalone .td file. When you want to add a new diagnostic, you have to run the
script and it tells you "your diagnostic's ID is <whatever the constexpr name
is we generate>" and that gets added in the `Diagnostic*Kind.td` file with
something like `StableID<Whatever>`. We could then decide that a `StableID`
marking can go either on an individual diagnostic OR on a diagnostic group
level; if on the diagnostic group level, then all diagnostics in the group are
assumed to be "the same diagnostic", if we wanted that kind of control. The
script would have to have some sort of reasonable naming scheme for the ID
variables, but because those aren't user-facing, they can be kind of ugly but
still better than a UUID; e.g., `StableID<Sema12344>` or something. We can help
downstreams out by the script having a knob that downstreams can use to set
their own starting point for IDs to help reduce conflicts. And the diagnostics
tablegenerator can give an error if someone adds a diagnostic which doesn't
have an ID (perhaps inherited from the group) or duplicates a UUID, to help
catch mistakes.
What do you think of something along those lines? It's a bit heavier of a
process to add a diagnostic than it is today, but we could actually make this a
nicer script that does more than just SARIF IDs. e.g., maybe this script
generates the full diagnostic for you. Like:
```
> gen-diag -part=sema -warning -default-error -name=do_not_do_bad_thing
> -msg="this is the diagnostic message, isn't it %select{great|terrible}0?"
> -group=bad-thing
def warn_do_not_do_bad_thing : Warning<
"this is the diagnostic message, isn't it %select{great|terrible}0">,
InGroup<DiagGroup<"bad-thing">>, DefaultError, StableID<Sema1234>;
```
(the tool could even automatically add the diagnostic text to the correct .td
files directly instead of making people copy and paste it, or have a
`--dry-run` option to show you what it would add to the file, etc.)
Maybe I'm scope-creeping too. :-D
https://github.com/llvm/llvm-project/pull/168153
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits