Near the end of every release a bunch of problem reports are raised for various punctuation, quoting, and spelling issues in GCC. In GCC 9 a total 28 such issues were submitted. A fair number of them are discovered as new or changed diagnostics are being translated. A checker was developed to help find some of these as well many others by scanning the gcc.pot file: contrib/check-internal-format-escaping.py. The checker automates the process of finding these issues but it doesn't prevent them. Being external to GCC the checker cannot easily distinguish between message strings that are expected to be translated and those that aren't.
To help avoid introducing the most glaring subset of these problems as early as possible while making it possible to differentiate messages that are expected to conform from those that need not, the attached patch adds a simple checker to GCC: -Wformat-diag. My goal is to improve consistency of messages and relieve the burden at the end of each release both on translators and those who then have to fix the problems. The warning detects some of the following: * unquoted operators consisting of two or more characters (e.g., == or ?:) * unquoted keywords, types, GCC built-ins and command line options, or identifiers with underscores * unquoted apostrophes and contractions like in can't * informal abbreviations (arg or reg) * unbalanced paired tokens (brackets, parentheses, or quotes) * duplicate or trailing spaces (it allows multiple spaces at the beginning of a diagnostic since that's used by the C++ front-end as an "indentation") * unintended control characters * inconsistent sentence capitalization To avoid diagnosing direct calls to pretty-printer functions that often format parts of the same diagnostic in multiple steps, the solution introduces the "_raw__" suffix to the __gcc_diag__ attributes that the warning then uses to avoid checking those calls. Functions with the raw attributes aren't checked. I tested the patch kit as a whole in an x86_64-linux build and fixed all the issues it pointed out except for 43 instances of the new warning in the Go front-end that I wasn't sure how best to handle (I will follow up on that with Ian). I also adjusted the affected tests. There were just a few issues specific to the i386 back-end but they had an impact on 9 tests. The impact of fixing the same kinds of issues in other back-ends is likely to be similar. Rather than trying to do the clean up across all supported targets I think it's better to handle the rest of the issues over time and with the help of back end maintainers who can more easily verify that tests still pass. With than in mind, I have prevented the warning from causing bootstrap failures. Once the cleanup is complete I expect to remove this and include the new warning in -Werror. Since a few files in GCC "misuse" the diagnostic machinery for debugging output leading up to internal errors (instead of calling a pp_xxx function) I suppressed the new warning in those files via #pragma GCC diagnostic ignored "-Wformat-diag". If we agree that calling pp_format or some such is the way to go I can work replacing those calls as a subsequent step. The subset should cover the following bug reports (some already resolved): 90176 - diagnostics should generally contain underscore only inside quotes 90162 - exclamation mark in diagnostic!!!!!1111!!!! 90158 - aarch64: wrong quotation in diagnostics 90157 - aarch64: unnecessary abbreviation in diagnostic 90156 - add linter check suggesting to replace %<%s%> with %qs 90149 - diagnostics containing BIT_FIELD_REF don't conform to diagnostics guideline 90121 - extra space in error message 90117 - Replace %<%s%> with %qs 90011 - [9 Regression] trailing space in diagnostic 79477 - Please write code more translator-friendly (unquoted options) 89936 - wrong punctuation in tree-profile.c (%<%s%>) Although the changes are mechanical, since I made them all by hand I broke up the patch into a series to make it easier to review: [PATCH 1/12] implement -Wformat-diag to detect quoting and spelling issues in GCC diagnostics [PATCH 2/12] fix diagnostic quoting/spelling in ada [PATCH 3/12] fix diagnostic quoting/spelling in Brig [PATCH 4/12] fix diagnostic quoting/spelling in the C front-end [PATCH 5/12] fix diagnostic quoting/spelling in c-family [PATCH 6/12] fix diagnostic quoting/spelling in C++ [PATCH 7/12] fix diagnostic quoting/spelling in libgcc [PATCH 8/12] fix diagnostic quoting/spelling in the middle-end [PATCH 9/12] adjust tests to quoting/spelling diagnostics fixes [PATCH 10/12] fix diagnostic quoting/spelling in D [PATCH 11/12] fix diagnostic quoting/spelling issues in i386 back-end [PATCH 12/12] fix diagnostic quoting/spelling issues in ObjC Martin