On 12/04/15 20:20, Daryl McDaniel wrote:
> On Friday, December 04, 2015, Laszlo Ersek wrote:
>> Now, after reading the -fno-common documentation in the gcc manual, I'm
>> actually *convinced* this is a gcc bug. The manual says on my RHEL-7.2
>> system (excerpt):
>>
>> -fno-common
>> In C code, controls the placement of uninitialized global
>> variables. Unix C compilers have traditionally permitted
>> multiple definitions of such variables in different
>> compilation units by placing the variables in a common
>> block. This is the behavior specified by -fcommon, and is
>> the default for GCC on most targets. On the other hand,
>> this behavior is not required by ISO C, [...]
>>
>
> I think this part of the GCC documentation was written a bit sloppily and used
> "definition" instead of "declaration".
I disagree. The documentation above describes the case when you have
--------
int x;
--------
in a *header* file in a project, and several .c files include that
header, and then those object files are linked together.
In the above example, each .c file that includes that header (and
thereby includes the "int x;" text in file scope) ends up with a
tentative definition of the "x" object. Assuming that each file has no
other, explicit, external definition for "x", the above text will end
up, in each file separately, *being* an external definition of "x". And
linking those together is forbidden by the standard.
> Since it is talking about
> "uninitialized" global variables, then it is a "tentative definition" and
> "the behavior is exactly as if the translation unit contains a
> file scope |declaration| of that identifier...".
You stopped quoting too early. The full text is:
"... the behavior is exactly as if the translation unit contains a file
scope declaration of that identifier, [...] with an initializer equal to 0".
The important part that you didn't quote is "initialzier equal to zero".
And with that in mind, if you read the paragraph in the standard just
preceding the one we quoted, it says
"If the declaration of an identifier for an object has file scope and an
initializer, the declaration is an external definition for the identifier".
In other words, the "as-if" initializer equal to zero language in the
"tentative definition" paragraph makes the previous paragraph (about the
external definition) apply.
>
>> Let us stop right there -- this behavior is *forbidden* by ISO C,
>> which can be proved if someone takes the time to decipher the nearly
>> impenetrable passages on declarations and definitions.
>
> I disagree, unless you are talking about the behavior of the -fno-common flag
> or where the GCC doc. says "this behavior is not required by ISO C".
> I think use of the -fno-common flag produces non-conforming behavior.
It's exactly the opposite. The -fno-common flag produces the conforming
behavior, and the default behavior (which silently merges "int x;" from
two preprocessed C files to the same object) is non-conforming.
If a header file wants to declare "int x", to be included by several
different .c files, then it must say "extern int x;", and exactly one of
the .c files must say "int x;".
>
>> This means that the above program (of two files) is equivalent to:
>>
>> f1-b.c:
>> ----
>> int x = 0;
>> ----
>>
>> f2-b.c:
>> ----
>> int x = 0;
>> int main(void) { return x; }
>> ----
>>
>> Therefore *each* file is ultimately required to behave "as if" it
>> contains one external definition for "x".
>
> I strongly disagree.
Okay, it's good to understand where we disagree.
> I believe the misunderstanding occurs from the very sloppy
> use of the term "translation unit" in the C specs; from ANSII (C89) through
> ISO/IEC 9899:TC3. The definition of a "translation unit" (TU) is fairly well
> defined as a single preprocessed C file,
IMO that definition (in 5.1.1.1) is pretty clear.
> but the specs are inconsistent in
> describing whether something applies to a SINGLE TU or a SET of TUs.
6.7 says "[...] somewhere in the entire program there shall be exactly
one external definition for the identifier", so that part is neutral to
what we consider a translation unit.
> Also poorly described is the difference between "translation" and "linking".
> While "linking" is beyond the scope of the spec.,
I agree that linking is beyond the scope of the spec.
> and since the process of
> "linking", and the use of "libraries", is mentioned many times, the spec could
> have been clearer about how this interacts with "translation units".
>
> From the spec as a whole, one can infer that a library is treated as a TU and
> when linked with other TUs to produce an executable program the entire set is
> treated as a single unit for linkage purposes.
I disagree. I re-read "5.1.1.1 Program structure" now, and I don't see
how it follows that separately created object files could count as a
single translation unit in the linking phase. The paragraph says
Translation units may be separately translated and then later
linked to produce an executable program.
Which means that my above interpretation (in the *-b.c translation
units) is right minimally for
gcc -std=c89 -pedantic -Wall -Wextra f1.c
gcc -std=c89 -pedantic -Wall -Wextra f2.c
gcc -o f f1.o f2.o
> The "implicit initialization" described in item 2 of section
> "External object definitions" seems to contradict other parts of the C spec.
> if one maintains that the TUs must be thought of as individual files in
> isolation.
I see no contradiction.
> Here are the specification sections related to this.
>
> 5.1.1.1 Program structure
> 1 A C program need not all be translated at the same time. The text of the
> program is kept in units called source files, (or preprocessing files) in
> this International Standard. A source file together with all the headers
> and source files included via the preprocessing directive #include is
> known
> as a preprocessing translation unit. After preprocessing, a preprocessing
> translation unit is called a translation unit. Previously translated
> translation units may be preserved individually or in libraries. The
> separate translation units of a program communicate by (for example) calls
> to functions whose identifiers have external linkage, manipulation of
> objects whose identifiers have external linkage, or manipulation of data
> files. Translation units may be separately translated and then later
> linked
> to produce an executable program.
>
> 6.2.2 Linkages of identifiers
> 1 An identifier declared in different scopes or in the same scope more than
> once can be made to refer to the same object or function by a process
> called linkage. [...]
Yes. Which is exactly what happens when you have "extern int x;" in many
separate files, in file and/or block scope.
The difference is that none of those cause storage to be reserved for
the object, and none of those is an external definition.
(Meaning that without an "int x;", with or without an initialzier, in
exactly one of the C source files, the entire program shall not link, if
the identifier is used in a non-sizeof expression.)
>
> 2 In the set of translation units and libraries that constitutes an entire
> program, each declaration of a particular identifier with external linkage
> denotes the same object or function. Within one translation unit, each
> declaration of an identifier with internal linkage denotes the same object
> or function. [...]
Yes. I agree. It doesn't contradict my point.
> 6.2.4 Storage durations of objects
> 3 An object whose identifier is declared with external or internal linkage,
> or with the storage-class specifier static has static storage duration.
> Its lifetime is the entire execution of the program and its stored value
> is initialized only once, prior to program startup.
>
> 6.9 External definitions
> Syntax
> 1 translation-unit:
> external-declaration
> translation-unit external-declaration
> external-declaration:
> function-definition
> declaration
>
> Semantics
> 4 As discussed in 5.1.1.1, the unit of program text after preprocessing is
> a translation unit, which consists of a sequence of external
> declarations. These are described as ''external'' because they appear
> outside any function (and hence have file scope). As discussed in 6.7,
> a declaration that also causes storage to be reserved for an object or a
> function named by the identifier is a definition.
>
> 5 An external definition is an external declaration that is also a
> definition of a function (other than an inline definition) or an object.
> If an identifier declared with external linkage is used in an expression
> (other than as part of the operand of a sizeof operator whose result is
> an integer constant), somewhere in the entire program there shall be
> exactly one external definition for the identifier; otherwise, there
> shall be no more than one.(140)
>
> (140) Thus, if an identifier declared with external linkage is not used
> in an expression, there need be no external definition for it.
>
> 6.9.2 External object definitions
> Semantics
> 2 A declaration of an identifier for an object that has file scope without
> an initializer, and without a storage-class specifier or with the
> storage-class specifier static, constitutes a tentative definition. If a
> translation unit contains one or more tentative definitions for an
> identifier, and the translation unit contains no external definition for
> that identifier, then the behavior is exactly as if the translation unit
> contains a file scope declaration of that identifier, with the composite
> type as of the end of the translation unit, with an initializer
> equal to 0.
It is stunning that by reading the exact same paragraphs, we arrive at
the opposite meaning.
> By my analysis, key items are:
> 1 Within the SET of TUs, each identifier with external linkage is the same
> object.
Each *declaration* of the same object with external linkage refers to
the same object, yes.
What we're discussing here are tentative definitions, which turn into
independent definitions (= resource allocations) for the same object,
once per translation unit (= preprocessed C source file).
> 2 If an identifier with external linkage is not initialized, then it will be
> implicitly initialized to 0.
*Assuming* that you have exactly one definition (= resource allocation)
for the object, in exactly one of the C files.
If you have only "extern int x;", then the identifier will have external
linkage, and it will never be initialized.
> 3 Within the entire program, there may be 0 or 1 initializer for an
> identifier
> with external linkage.
Yep.
> 4 The entire SET of TUs comprising a program must be examined in order to
> resolve identifiers with external linkage.
I strongly disagree. If you have:
header.h:
extern int x;
void f(void);
source1.c:
#include "header.h"
int x;
int main(void) { f(); return x; }
source2.c:
#include "header.h"
void f(void) { x = 1; }
then the compiler can immediately resolve "return x" in "source1.c",
without having to look at "source2.c".
> 5 The use, and definition, of the term "translation unit" within the
> C specifications is inconsistent and misleading in the presence of
> "libraries" and multiple source files making up a single executable
> program.
>
>> This patch will enforce the right behavior.
>>
>> Reviewed-by: Laszlo Ersek <[email protected]>
>>
>> Thanks!
>> Laszlo
>
> I don't agree, use of -fno-common is a bad idea
Why is it a bad idea? It enforces standards compliant behavior.
> and does not really fix the
> problem.
It does not fix the problem, but it turns silent unintended aliasing
problems into build time errors.
> The code needs to be re-examined and fixed.
I agree that the code needs to be fixed. The fix is to:
(a) declare the variables that aren't meant to be used outside of a
single C file STATIC;
(b) where several C files of a library instance access the same static
storage duration variable:
(b1) the variable must be declared "*extern* TYPE variable;" in an
internal header file,
(b2) all referring source files must include that internal header,
(b3) exactly one source file must *define* the variable with "TYPE
variable",
(b4) such variables must be named differently between edk2 modules (=
library instances, drivers, applications);
(c) in order to prevent *unintended* aliasing between same-name,
external linkage, static storage duration objects, due to gcc's
standards-breaking default behavior, Ard's patch is necessary.
It all boils down to a choice between the following two options:
(1) we stick with the current options, and notice unintended variable
aliasing between independent library instances only at runtime, when
things break inexplicably, *or*
(2) we enforce what the standard requires, and get nice error messages
about conflicting double definitions at build time. Given that some of
the current code doesn't actually conform to the standard, replacing the
obscure variable merging with error messages will expose current,
invalid code for what it is, and those parts will have to be fixed up.
What we have now is "merge variables", which sometimes matches the
intent, sometimes doesn't, and the only time to tell those cases apart
is runtime.
Thanks
Laszlo
>
>> On 12/03/15 13:54, Ard Biesheuvel wrote:
>>> The default behavior of the GCC compiler is to emit uninitialized
>>> globals into a COMMON section, where duplicate definitions are merged.
>>> This may result in unexpected behavior, since global variables appearing
>>> by the same name in different C files typically do not refer to the same
>>> conceptual data item.
>
> If the files are being linked together, then this is exactly the correct
> behavior. If the variables do not refer to the "same conceptual data item"
> then they either have to be renamed or made static.
>
> This is a common problem when combining code that comes from multiple
> unrelated sources. Checking for this type of name collision should be part
> of the porting process.
>
> Sincerely,
> Daryl McDaniel
>
>
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel