aaron.ballman added inline comments.
================ Comment at: clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp:99 + + if (!Current->isLiteral() || isStringLiteral(Current->getKind()) || + !isIntegralConstant(*Current)) { ---------------- aaron.ballman wrote: > LegalizeAdulthood wrote: > > LegalizeAdulthood wrote: > > > aaron.ballman wrote: > > > > LegalizeAdulthood wrote: > > > > > aaron.ballman wrote: > > > > > > LegalizeAdulthood wrote: > > > > > > > aaron.ballman wrote: > > > > > > > > LegalizeAdulthood wrote: > > > > > > > > > aaron.ballman wrote: > > > > > > > > > > I know this is code moved from elsewhere, but I suppose we > > > > > > > > > > never considered the odd edge case where a user does > > > > > > > > > > something like `"foo"[0]` as a really awful integer > > > > > > > > > > constant. :-D > > > > > > > > > It's always possible to write crazy contorted code and have a > > > > > > > > > check not recognize it. I don't think it's worthwhile to > > > > > > > > > spend time trying to handle torture cases unless we can find > > > > > > > > > data that shows it's prevalent in real world code. > > > > > > > > > > > > > > > > > > If I was doing a code review and saw this: > > > > > > > > > ``` > > > > > > > > > enum { > > > > > > > > > FOO = "foo"[0] > > > > > > > > > }; > > > > > > > > > ``` > > > > > > > > > I'd flag that in a code review as bogus, whereas if I saw: > > > > > > > > > ``` > > > > > > > > > enum { > > > > > > > > > FOO = 'f' > > > > > > > > > }; > > > > > > > > > ``` > > > > > > > > > That would be acceptable, which is why character literals are > > > > > > > > > accepted as an integral literal initializer for an enum in > > > > > > > > > this check. > > > > > > > > > I don't think it's worthwhile to spend time trying to handle > > > > > > > > > torture cases unless we can find data that shows it's > > > > > > > > > prevalent in real world code. > > > > > > > > > > > > > > > > I think I'm okay agreeing to that in this particular case, but > > > > > > > > this is more to point out that writing your own parser is a > > > > > > > > maintenance burden. Users will hit cases we've both forgotten > > > > > > > > about here, they'll file a bug, then someone has to deal with > > > > > > > > it. It's very hard to justify to users "we think you write > > > > > > > > silly code" because they often have creative ways in which > > > > > > > > their code is not actually so silly, especially when we support > > > > > > > > "most" valid expressions. > > > > > > > Writing your own parser is unavoidable here because we can't just > > > > > > > assume that any old thing will be a valid initializer just by > > > > > > > looking at the set of tokens present in the macro body. (There > > > > > > > is a separate discussion going on about improving the > > > > > > > preprocessor support and parsing things more deeply, but that > > > > > > > isn't even to the point of a prototype yet.) The worst thing we > > > > > > > can do is create "fixits" that produce invalid code. > > > > > > > > > > > > > > The worst that happens if your expression isn't recognized is > > > > > > > that your macro isn't matched as a candidate for an enum. You > > > > > > > can always make it an enum manually and join it with adjacent > > > > > > > macros that were recognized and converted. > > > > > > > > > > > > > > As it stands, the check only recognizes a single literal with an > > > > > > > optional unary operator. > > > > > > > > > > > > > > This change expands the check to recognize a broad range of > > > > > > > expressions, allowing those macros to be converted to enums. I > > > > > > > opened the issue because running modernize-macro-to-enum on the > > > > > > > [[ https://github.com/InsightSoftwareConsortium/ITK | ITK > > > > > > > codebase ]] showed some simple expressions involving literals > > > > > > > that weren't recognized and converted. > > > > > > > > > > > > > > If an expression isn't recognized and an issue is opened, it will > > > > > > > be an enhancement request to support a broader range of > > > > > > > expression, not a bug that this check created invalid code. > > > > > > > > > > > > > > IMO, the more useful thing that's missing from the grammar is > > > > > > > recognizing `sizeof` expressions rather than indexing string > > > > > > > literals with an integral literal subscript. > > > > > > > > > > > > > > I had planned on doing another increment to recognize `sizeof` > > > > > > > expressions. > > > > > > > Writing your own parser is unavoidable here because we can't just > > > > > > > assume that any old thing will be a valid initializer just by > > > > > > > looking at the set of tokens present in the macro body. > > > > > > > > > > > > If you ran the token sequence through clang's parser and got an AST > > > > > > node out, you'd have significantly *more* information as to whether > > > > > > something is a valid enum constant initializer because you can > > > > > > check that it's an actual constant expression *and* that it's > > > > > > within a valid range of values. This not only fixes edge case bugs > > > > > > with your approach (like the fact that you can generate a series of > > > > > > literal expressions that result in a value too large to store > > > > > > within an enumerator constant), but it enables new functionality > > > > > > your approach currently disallows (like using constexpr variables > > > > > > instead of just numeric literals). > > > > > > > > > > > > So I don't agree that it's unavoidable to write another custom > > > > > > parser. > > > > > You keep bringing up the idea that the values have to be known, but > > > > > so far they don't. > > > > > > > > > > Remember, we are replacing macro identifiers with anonymous enum > > > > > identifiers. We aren't specifying a restricting type to the enum, so > > > > > as long as it's a valid integral literal expression, we're not > > > > > changing any semantics. Unscoped enums also allow arbitrary > > > > > conversions to/from an underlying integral type chosen by the > > > > > compiler. > > > > > > > > > > C++20 9.7.1 paragraph 7 says: > > > > > > > > > > > For an enumeration whose underlying type is not fixed, the > > > > > > underlying type is an integral type that can > > > > > > represent all the enumerator values defined in the enumeration. If > > > > > > no integral type can represent all the > > > > > > enumerator values, the enumeration is ill-formed. It is > > > > > > implementation-defined which integral type is used > > > > > > as the underlying type except that the underlying type shall not be > > > > > > larger than int unless the value of an > > > > > > enumerator cannot fit in an int or unsigned int . If the > > > > > > enumerator-list is empty, the underlying type is as > > > > > > if the enumeration had a single enumerator with value 0. > > > > > > > > > > So the compiler is free to pick an underlying type that's large > > > > > enough to handle all the explicitly listed initial values. Do we > > > > > actually need to know the values for this check? I don't think so, > > > > > because we aren't changing anything about the type of the named > > > > > values. When the compiler evaluates an integral literal, it goes > > > > > through a similar algorithm assigning the appropriate type to those > > > > > integral values: > > > > > > > > > > C++20 5.9 paragraph 2: > > > > > > > > > > > A preprocessing number does not have a type or a value; it acquires > > > > > > both after a successful conversion to an > > > > > > integer-literal token or a floating-point-literal token. > > > > > > > > > > C++20 5.13.2 paragraph 3: > > > > > > > > > > > The type of an integer-literal is the first type in the list in > > > > > > Table 8 corresponding to its optional integer-suffix > > > > > > in which its value can be represented. > > > > > > > > > > The table says the type is int, unsigned int, long int, unsigned long > > > > > int, long long int, or unsigned long long int based on the suffix and > > > > > the value and that the type is chosen to be big enough to hold the > > > > > value if the suffix is unspecified. > > > > > > > > > > > but [using `clangParse`] enables new functionality your approach > > > > > > currently disallows (like using constexpr variables instead of just > > > > > > numeric literals). > > > > > > > > > > I agree that if we used the full parser, we'd bring in `constexpr` > > > > > expressions as valid initializers for the enums. However, before > > > > > engaging in all that work, I'd like to see how likely this is in > > > > > existing codebases by feedback from users requesting the support. > > > > > Maybe engaging the parser isn't a big amount of work, I don't > > > > > actually know. I've never looked deeply at the actual parsing code > > > > > in clang. Maybe it's easy enough to throw a bag of tokens at it and > > > > > get back an AST node, maybe not. (I suspect not based on my > > > > > experience with the code base so far.) > > > > > > > > > > My suspicion is that code bases that are heavy with macros for > > > > > constants //aren't// using modern C++ in the body of those macros to > > > > > define the values of those constants. Certainly this is 100% true > > > > > for C code that uses macros to define constants, by definition. This > > > > > check applies equally well to C code as C has had enums forever but > > > > > even recent C code still tends to use macros for constants. > > > > > > > > > > Still, my suspicions aren't data. I'd like to get this check > > > > > deployed in a basic fashion and let user feedback provide data on > > > > > what is important. > > > > > > > > > > > So I don't agree that it's unavoidable to write another custom > > > > > > parser. > > > > > > > > > > That's a fair point. //Some// kind of parser is needed to recognize > > > > > valid initializer expressions or we run the risk of transforming > > > > > valid code into invalid code. Whether it is a custom recognizer as > > > > > I've done or `clangParse` is what we're debating here. > > > > > You keep bringing up the idea that the values have to be known, but > > > > > so far they don't. > > > > > > > > See comments at the top level. > > > > > > > > > So the compiler is free to pick an underlying type that's large > > > > > enough to handle all the explicitly listed initial values. Do we > > > > > actually need to know the values for this check? > > > > > > > > Yes, C requires the enumeration constants to be representable with > > > > `int`. But also, because this is in the `modernize` module, it's very > > > > likely we'll be getting a request to convert to using a scoped > > > > enumeration or an enumeration with the appropriate fixed underlying > > > > type in C++ as well. > > > Oh, I see now, thanks for explaining it. I didn't realize that C > > > restricts the values to `int`. > > Regarding conversion to a scoped enum, I think that is best handled by a > > separate enum-to-scoped-enum check. I have one I've been working on > > separately. As bad as it is to convert macros (since they have no respect > > for structure or scope), it's quite a bit of work to convert a non-scoped > > enum to an enum because now implicit conversions enter the picture and > > expressions involving macros (e.g. `FLAG_X | FLAG_Y`) also get much more > > complicated. Not only that but usages have to have types updated. I don't > > think it's very useful to upgrade to a scoped enum and then have every use > > wrapped in `static_cast<int>()`. It just creates uglier code than what was > > there before and I don't think people would adopt such a check. > > > > Regarding conversion to an appropriate fixed underlying type, that isn't > > allowed on unscoped enums, only on scoped enums, so it has all the above > > complexity plus selecting the appropriate fixed underlying type. > You're welcome, sorry for not pointing it out sooner! > Regarding conversion to a scoped enum, I think that is best handled by a > separate enum-to-scoped-enum check. It's been a while since I checked, but I recall that checks with interacting fix-its tend not to play well together. We should probably see if that's still the case today. As an example, if the enum-to-scoped-enum check runs BEFORE the modernize-macros-to-enum check, then the behavior will be worse than if the checks are run in the reverse order. Because of issues like that, I'm not quite as convinced that a separate check is best (though I do agree it's notionally better). > Regarding conversion to an appropriate fixed underlying type, that isn't > allowed on unscoped enums, only on scoped enums, so it has all the above > complexity plus selecting the appropriate fixed underlying type. That's incorrect; fixed underlying types and scoped enumerations are orthogonal features (though a scoped enumeration always has a fixed underlying type): https://godbolt.org/z/sGYsjdnrT CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124500/new/ https://reviews.llvm.org/D124500 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits