On May 23, 2013, at 11:29 PM, Richard Smith <[email protected]> wrote:
> On Thu, May 23, 2013 at 10:41 PM, John McCall <[email protected]> wrote:
> On May 23, 2013, at 10:23 PM, Richard Smith <[email protected]> wrote:
> > So... this problem was not really new in C++11. In C++98 it can be
> > witnessed for an inline function such as:
> >
> > inline const char *get() {
> > static const char *str = "foo";
> > return str;
> > }
>
> How is this different from the following?
>
> inline const char *get_nostatic() { return "foo"; }
>
> or
>
> inline const char *get_separate() {
> const char *temp = "foo";
> static const char *str = tmp;
> return str;
> }
>
> Please find or add something in the standard which will allow us to
> not export a symbol for every string literal(*) that happens to be used
> in a function with weak linkage.
>
> Finding failed. In addition to the implications of the ODR, we have this:
>
> [dcl.fct.spec]p4: "A string literal in the body of an extern inline function
> is the same object in different translation units."
This is a really terrible language requirement. Does anyone actually do what's
necessary for this? I really can't imagine actually implementing it; it would
be a *ton* of new extern symbols.
> On the adding front, perhaps the simplest way to avoid generating such extra
> symbols (at least, in most cases) would be to specify that a string literal
> expression may produce the address of a different (static storage duration)
> object each time it is evaluated. However, even if we allow that, I don't
> think it's reasonable for an unchanging static storage duration pointer or
> reference to point at different objects depending on who is asking.
I agree; I just really don't want to have to export unique symbols for every
logging statement in an inline function.
So, let's see. I see two basic language designs and implementation strategies.
1. The first is that the source location of a string literal (function /
initializer where it appears and its source order therewithin) is actually a
crucial semantic property that compilers have to track/update through
everything. (Source order becomes a really interesting question when you
consider default argument expressions.) Not every string literal is blessed
this way; just the ones that show up in (1) inline functions or (2)
initializers of (weak-linkage) constexpr variables with static storage
duration. This is a major implementation pain, and it becomes a bizarre new
pervasive cost of C++ just to satisfy a requirement that very, very few people
care about. Hooray.
2. The second is that we somehow limit this problem to just initializing an
object of static storage duration.
There are three places where we can have initializers for the same object in
different translation units:
- constexpr static data members
- static data members of a class template
- static local variables in inline functions
The constexpr and non-constexpr cases are subtly different.
In the constexpr case, we know that everybody agrees that the initializer can
be constant-evaluated, and we can assume that everybody evaluates it to the
same constant. This gives us a number of ways to stably identify sub-objects
in the variable. If we actually have to emit the definition, that's easy
enough, too.
In the non-constexpr case, we don't know that, and we have to compile the code
as if there was a possibility that somebody might have emitted as a dynamic
initializer. So I think we can't make any assumptions about string-literal
pointer values stored in the variable; we always have to load them out, which
is really unfortunate.
Also, this entire approach seems to make the presence of 'constexpr' affect
ABI. (It does get caught by ODR, so that's *legal*, but I don't know that it's
*a good idea*.)
It's also unclear what *parts* of any given initializer will be
constant-initialized vs. dynamically-initialized; consider:
inline const char *second(const char *a, const char *b) { return b; }
inline const char *ident(const char *s) { return s; }
...
inline void test() {
static const char *strs[] = { second("a", "b"), ident("c"), "d" };
}
The only part that's "guaranteed" to be constant-initialized is the third
element, but a compiler which does constant-initialize this can get both of the
others. And note that the string literals we use aren't 1-1 with the string
literals in the initializer; the uniquing scheme needs to be positional within
the initialized object to ensure that different translation units use the same
thing. (That is, "d" would have to mangled as "_Z4testEv::strs[2]".)
I think the right solution is to:
- concede that (1) is the simpler language and implementation design but
- nonetheless refuse to implement it due to an insufficient indignant-user
count (and a reasonable suspicion of seeing a higher indignant-user count if we
did).
In practice, I believe most linkers will coalesce common string values within a
linkage unit, which is all that even the few people who care about this
actually want.
John.
_______________________________________________
cxx-abi-dev mailing list
[email protected]
http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev