On Sat, 2 Feb 2019, Martin Sebor wrote:

I don't feel too strongly about whether -Wmaybe-uninitialized should
be in -Wall or in -Wextra, and I might even be somewhat more inclined
to expect to find it in the latter.  But since you sound like you are
gearing up for proposing other changes in the same vein down the line
where I might have stronger opinions, I should comment.

[It's a bit of a long-winded response because I've been thinking about
this topic a lot lately.]

Thank you for taking the time to write this down.

In general, I think a discussion of warning groupings is worth having
(even needed) at the right time, but stage 4 doesn't strike me as
the most opportune occasion.

Specifically for -Wmaybe-uninitialized, the option has been in -Wall
for many releases, and no major changes to it have been made recently
that I know.  So what I'm missing in this proposal is: why now?  What
has changed to make this a pressing issue now?  Has its rate of false
positives gone up significantly?  If so, by how much and why?

I've been wanting to post this for years, but I only got motivated enough recently after seeing negative effects of -Wmaybe-uninitialized in two projects within a few days. I don't think the rate of false positives has gone up significantly in gcc-9, they just randomly appear or disappear from one release to the next. In some sense, for projects that support multiple compiler versions, each new gcc release makes the number of false positives (on at least one version) go up, but that doesn't count as the warning getting worse.

The discussion and/or the change don't need to happen now, but if I didn't post this patch while motivated, it might be years before I actually do it, so I ignored stage 4.

Please be clear about what you mean by false positives.  Is it that
the warning triggers contrary to the documentation ("a path exists
where the variable is uninitialized along with one where it is"),
or that the path to the use of the variable does exist but we
(though not GCC) can tell from the context that it cannot be reached?

The latter.

(The latter wouldn't qualify as a false positive as the term is defined
in literature; i.e., the warning works as designed, we just don't like
or agree with it.)

Indeed the definition of false positive is important. In some sense, no warning is ever a false positive, when its definition is "the compiler decided to warn". But that makes the warning useless. -Wmaybe-uninitialized does not say there is an uninitialized use, it says the control flow is too complicated or the compiler not clever enough (or there is indeed an uninitialized use for some input, and only this subcase is a bug). Another warning, -Wstrict-overflow (after it moved to the middle-end), was more of an optimization note than a true warning: "warning: assuming your code is valid when generating code", Duh! Was I supposed to make it invalid to quiet the warning?

In practice, false positives (and negatives) of both kinds, whether
they fit the formal definition or the informal one, are the nature
of virtually all non-trivial static diagnostics, certainly all those
that depend on control or data flow analysis.  Some are due to bugs
or limitations in the implementation of the warning.  Others are
inherent in the technology.

Yes, and I argue that these warnings belong in a different "level" of warnings than the trivial warnings.

Is this warning more prone to one kind than others? If so, is it because it's implemented poorly, or that its implementation hasn't kept up with the improvements to the optimizers, or has the infrastructure it depends on become ill-suited for it to avoid some of the false positives (as formally defined), or is it something else?

I am mostly looking at what I see in practice. I regularly see false positives of -Wstrict-overflow (less now that I neutered parts of it) and -Wmaybe-uninitialized. And when I have strange crashes that later turn out to be caused by uninitialized memory uses, it is very seldom that -Wmaybe-uninitialized helps detect them. IIRC, last time, I had slightly more success with -Wreturn-local-addr -O3 -fkeep-inline-functions -fkeep-static-functions.

I think several of the new middle-end warnings you added are about string manipulation. I don't do any of that, which may explain why I am not seeing them. Also, we are not using gcc-9 much yet.

These false positives are not easy to avoid, as required to be part of -Wall. Avoiding them, when it is possible at all, requires not just a syntactic tweak, like adding parentheses, but a semantic change that can make the code worse. Initializing something that does not need it is extra code (increases code size and running time). It also prevents better tools from detecting true uninitialized uses, either static analyzers or runtime checkers (sanitizer, valgrind).

I don't find the argument very compelling that diagnosing potential
bugs should be avoided because the fix (or the suppression in
the case of a false positive) could be wrong.   The risk exists with
all diagnostics.

Yes, but in practice I haven't seen people get the parentheses wrong after a warning from -Wparentheses.

I also take issue with the suggestion that dynamic analysis is
necessarily a superior mechanism for detecting bugs.  They each have
their strengths and weaknesses.  They are not an either-or proposition
but rather complementary solutions.

I didn't claim they were necessarily superior, and I started my enumeration of potential better tools with static analyzers, i.e. like gcc's -Wmaybe-uninitialized, but really focused on this and not on optimization. Dynamic analysis has the advantage that it avoids dead branches and thus a whole class of false positives, but indeed it is also a weakness that it misses a lot of possible paths not covered during testing. I agree 100% that they are complementary.

Lastly, in the case of uninitialized variables, the usual solution
of initializing them is trivial and always safe (some coding styles
even require it).

Here it shows that we don't work with the same type of code at all. If I am using a boost::optional, i.e. a class with a buffer and a boolean that says if the buffer is initialized, how do I initialize the (private) buffer? Or should boost itself zero out the buffer whenever the boolean is set to false? The variables can easily be hidden behind dozens of levels of abstraction that make them hard to initialize, and there may be nothing meaningful to initialize them with. Uninitialized also includes clobbered (out-of-scope for instance) in gcc, where it isn't clear what you are supposed to initialize to quiet the warning.

By the way, we usually try to avoid imposing a coding style, although that's not always true.

Initializing scalars has a negligible performance
impact in practice, and, if it's thought to be necessary, can easily
be done conditionally (as in, based on some macro) so that
the uninitialized accesses can still be detected by dynamic analysis.

I'd rather avoid increasing yet again the uses of macros, but yes, when initializing to quiet what has been identified as a false positive, that seems desirable. And it can be arrays instead of scalars.

This message concentrates on the negatives, but that doesn't mean I consider -Wmaybe-uninitialized as useless. It can find true uninitialized uses. And even some false positives can point at places where we can help the compiler generate better code (say with a default __builtin_unreachable case in a switch). I did in the past contribute patches to make it warn more often, and I might do so again in the future.

This paragraph makes me think you equate false positives from this
warning with those from -Wuninitialized.

Uh? No, I don't know what gave you this impression.

"May be unitialized" doesn't preclude the possibility that the variable is not used uninitialized.

But if the variable can never be used uninitialized and gcc is simply unable to prove it, the warning is bad. It may not be a "false positive" depending on your definition, but it is undesirable.

My opinion is that -Wmaybe-uninitialized would serve its purpose better as part of -Wextra. People tend to use -Wall with -Werror (either explicitly or implicitly by modifying the code until all warnings are gone). What I see currently in projects where I participate is that -Wmaybe-uninitialized is making things worse. People don't investigate deeply the cause of the warning, they just go for whatever "quick-fix" makes the compiler shut up. Quite often, this involves extra code that is less readable and performs worse, while it didn't even "fix" what caused the warning, it just randomly ended up with code where the compiler doesn't warn (possibly because the function got bigger, which changed inlining decisions...).

I agree that we need to be mindful of warnings leading to unnecessary
and perhaps inappropriate code changes (as the author of a few warnings
with this potential it has been on mind a lot).

At the same time, not taking the time to properly analyze diagnostics
and come up with the appropriate solutions seems like a problem that's
orthogonal to the warnings themselves.  Disabling warnings in hopes of
solving what's really a problem with the development process or
a culture of a subset of projects doesn't seem like an optimal solution.

I agree with that. But if I complain about the mindless quickfixes, the reply is that customers ask for a library that does not warn, and the developers do not have infinite time, so they do the minimum to avoid warnings. This is strongly linked to the warning levels.

One could recommend that customers include the library with -isystem to disable the warnings inside the library, bug that's fragile (though it has gotten quite good), and most importantly it also disables some warnings that are relevant to the user.

If the warning is not enabled by default so much but only when people are ready to investigate any warning thoroughly, the quickfix mentality is less likely to be present. People using -Wmaybe-uninitialized need to be willing to ignore false positives, or disable them with pragmas.

Note that similar arguments may apply to some other warnings that somehow made their way into -Wall when they shouldn't have, but for now I am only proposing to move -Wmaybe-uninitialized. Some people tend to consider that if a warning is not part of -Wall, it might as well not exist. Obviously I disagree with that.

The description of -Wall is vague enough that arguments about which
warnings should or shouldn't be enabled by it are unavoidably subject
to individual biases and unlikely to lead to a consensus, either among
ourselves or, even less likely, among users.

True. To me, one of the key documented properties of -Wall is being "easy to avoid". -Wall -Werror should be a usable combination. Anything not easy to avoid belongs in a different class, say -Wextra. We can rename that to W1/W2/... if Wall is too charged with historical meaning.

One issue is that, especially in a large, long-lived code base, people want to know when a change (or a new compiler) introduces a new warning. However, they do not want to have to go through hundreds or pre-existing, already analysed warnings. The easiest way to do that is to make sure there are no warnings at all (-Werror). For speculative warnings, that's really not convenient. I don't think there is a good solution to this (marking with a pragma the known false positives may be hard when instantiating some class somewhere causes a warning in some almost unrelated place, and you want the marking to be narrow enough that it won't also disable other warnings), and it is independent on whether the warning is in -Wall or -Wextra. I just have a feeling that splitting the easy and the not-easy warnings may help developers handle them differently.

A shared definition of a false positive should be one of the very first steps to coming closer to a consensus. Real world (as opposed to anecdotal) data on the rates of actual rates of false positives and negatives vs true positives would be also most helpful, as would some consensus of the severity of the bugs the true positives expose, as well as some objective measure of the ease of suppression. There probably are others but these would be a start.

This data is going to be super hard to get. Most projects have been compiling for years and tweaking their code to avoid some warnings. We do not get to see the code that people originally write, we can only see what they commit.

--
Marc Glisse

Reply via email to