On 8/5/19 1:25 PM, Jason Merrill wrote:
On 8/1/19 7:35 PM, Martin Sebor wrote:
On 8/1/19 12:09 PM, Jason Merrill wrote:
On 7/22/19 12:34 PM, Martin Sebor wrote:
Ping: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00622.html
On 7/8/19 3:58 PM, Martin Sebor wrote:
The attached patch implements three new warnings:
* -Wstruct-not-pod triggers for struct definitions that are not
POD structs,
* -Wclass-is-pod triggers for class definitions that satisfy
the requirements on POD structs, and
* -Wmismatched-tags that triggers for class and struct declarations
with class-key that doesn't match either their definition or
the first declaration (if no definition is provided).
The implementation of -Wclass-is-pod and -Wstruct-not-pod is fairly
straightforward but the -Wmismatched-tags solution is slightly
unusual.
It collects struct and class declarations first and diagnoses
mismatches
only after the whole tramslation unit has been processed. This is so
that the definition of a class can guide which declarations to
diagnose
no matter which come first.
As Jonathan and I were saying in the connected discussion, the *pod
warnings enforce questionable coding guidelines, so I'd rather not
have them in the compiler.
What specifically do you consider questionable?
When defining a new class one has to decide whether to use 'struct'
and when to use 'class'. Since there's no difference, adopting any
convention to help make a consistent choice seems like a natural
thing to do. You could say (as I think someone in this thread did):
"make it 'class' when it has a ctor or dtor." Someone else might
add: "or assignment." Someone else still might go even further
and include "private members or bases or virtual functions." Since
all the type definitions end up equivalent regardless of the class-key,
all these choices seem equally reasonable. But all of them are also
arbitrary. The only difference is that by being based on a common
(and at least historically, widely relied on) property, the POD
convention is less arbitrary. That's probably why it's apparently
quite popular (as evident from the Stack Exchange thread).
Yes, all of them are somewhat arbitrary. This one seems like a poor
choice, for the reasons Jon and I gave in the other thread. IMO
'aggregate' would be a better property to use.
Which Stack Exchange thread do you mean?
The one I mentioned here:
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01472.html
i.e. the most popular answer to the question
When should you use a class vs a struct in C++?
is
I would recommend using structs as plain-old-data structures
without any class-like features, and using classes as aggregate
data structures with private data and member functions.
https://stackoverflow.com/questions/54585
My goal with the warning is to provide a way to help enforce
the convention we know is in use, both in GCC (at least until
it's removed) and in other projects. I'm not interested in
inventing something new even if it's "better" (in some sense)
than the convention we know is in use.
That said, as the C++ standard itself says, PODs (or in recent
revisions, trivial standard-layout types) are most commonly
associated with interoperability with C (and other languages).
Aggregates can have data members of types that don't exist in
C (references, pointers to members). So a convention that uses
the keyword 'struct' to reflect the common interoperability
aspect or that wants to convey that a C++ struct looks and feels
like one would expect of a C struct, will want to be based on
the concept of POD, not that of an aggregate.
-Wmismatched-tags is useful to have, given the MSVC/clang situation,
but I wonder about memory consumption from all the record keeping.
Do you have any overhead measurements?
I did think about the overhead but not knowing if the patch would
even be considered I didn't spend time optimizing it.
In an instrumented build of GCC with the patch I just did I have
collected the following stats for the number of elements in
the rec2loc hash table (for 998 files):
rec2loc elements locvec elements
min: 0 0
max: 2,785 3,303
mean: 537 1,043
median: 526 1,080
The locvec data are based on totals for the whole hash table, so
in the worst case the hash_map has 2,785 elements, and the sum of
all locvec lengths in all those elements is 3,303. Each rec2loc
element takes up 16 bytes, plus the size of the locvec data.
If my math is right (which doesn't happen too often) in the worst
case the bookkeeping overhead is 43KB for the hash_map plus on
the order of 140KB for the vectors (just doubling the number of
elements to account for capacity = 2 X size, times 24 for
the flag_func_loc_t element type). So 183K in the worst case
in GCC.
183k cumulative over all the GCC sources doesn't sound excessive, but...
There are a few ways to reduce that if it seems excessive.
One is by avoiding some waste in flag_func_loc_t which is
pair<tag_types, pair<bool, pair<tree, location_t>>>
tree could come first and tag_types and the bool could share
space. That should bring it down to 16 in LP64, for about
30% off the 183K.
Beyond that, we could change the algorithm to discard records
for prior declarations after the first definition has been seen
(and any mismatches diagnosed).
We could also only collect one record for each definition
in system headers rather than one for every declaration and
reference.
...these all sound worthwhile.
Okay, I'll look into it.
To be clear: it was 183K maximum for a single compilation, not
aggregate for all of them.
Martin