On May 23, 2013, at 11:29 PM, Richard Smith <[email protected]> wrote:
> On Thu, May 23, 2013 at 10:41 PM, John McCall <[email protected]> wrote:
> On May 23, 2013, at 10:23 PM, Richard Smith <[email protected]> wrote:
> > So... this problem was not really new in C++11. In C++98 it can be 
> > witnessed for an inline function such as:
> >
> > inline const char *get() {
> >   static const char *str = "foo";
> >   return str;
> > }
> 
> How is this different from the following?
> 
>   inline const char *get_nostatic() { return "foo"; }
> 
> or
> 
>   inline const char *get_separate() {
>     const char *temp = "foo";
>     static const char *str = tmp;
>     return str;
>   }
> 
> Please find or add something in the standard which will allow us to
> not export a symbol for every string literal(*) that happens to be used
> in a function with weak linkage.
> 
> Finding failed. In addition to the implications of the ODR, we have this:
> 
> [dcl.fct.spec]p4: "A string literal in the body of an extern inline function 
> is the same object in different translation units."

This is a really terrible language requirement.  Does anyone actually do what's 
necessary for this?  I really can't imagine actually implementing it;  it would 
be a *ton* of new extern symbols.

> On the adding front, perhaps the simplest way to avoid generating such extra 
> symbols (at least, in most cases) would be to specify that a string literal 
> expression may produce the address of a different (static storage duration) 
> object each time it is evaluated.  However, even if we allow that, I don't 
> think it's reasonable for an unchanging static storage duration pointer or 
> reference to point at different objects depending on who is asking.

I agree;  I just really don't want to have to export unique symbols for every 
logging statement in an inline function.

So, let's see.  I see two basic language designs and implementation strategies.

1.  The first is that the source location of a string literal (function / 
initializer where it appears and its source order therewithin) is actually a 
crucial semantic property that compilers have to track/update through 
everything.  (Source order becomes a really interesting question when you 
consider default argument expressions.)  Not every string literal is blessed 
this way;  just the ones that show up in (1) inline functions or (2) 
initializers of (weak-linkage) constexpr variables with static storage 
duration.  This is a major implementation pain, and it becomes a bizarre new 
pervasive cost of C++ just to satisfy a requirement that very, very few people 
care about.  Hooray.

2.  The second is that we somehow limit this problem to just initializing an 
object of static storage duration.

There are three places where we can have initializers for the same object in 
different translation units:
  - constexpr static data members
  - static data members of a class template
  - static local variables in inline functions

The constexpr and non-constexpr cases are subtly different.

In the constexpr case, we know that everybody agrees that the initializer can 
be constant-evaluated, and we can assume that everybody evaluates it to the 
same constant.  This gives us a number of ways to stably identify sub-objects 
in the variable.  If we actually have to emit the definition, that's easy 
enough, too.

In the non-constexpr case, we don't know that, and we have to compile the code 
as if there was a possibility that somebody might have emitted as a dynamic 
initializer.  So I think we can't make any assumptions about string-literal 
pointer values stored in the variable;  we always have to load them out, which 
is really unfortunate.

Also, this entire approach seems to make the presence of 'constexpr' affect 
ABI.  (It does get caught by ODR, so that's *legal*, but I don't know that it's 
*a good idea*.)

It's also unclear what *parts* of any given initializer will be 
constant-initialized vs. dynamically-initialized;  consider:
  inline const char *second(const char *a, const char *b) { return b; }
  inline const char *ident(const char *s) { return s; }
  ...
  inline void test() {
    static const char *strs[] = { second("a", "b"), ident("c"), "d" };
  }
The only part that's "guaranteed" to be constant-initialized is the third 
element, but a compiler which does constant-initialize this can get both of the 
others.  And note that the string literals we use aren't 1-1 with the string 
literals in the initializer;  the uniquing scheme needs to be positional within 
the initialized object to ensure that different translation units use the same 
thing.  (That is, "d" would have to mangled as "_Z4testEv::strs[2]".)


I think the right solution is to:
  - concede that (1) is the simpler language and implementation design but
  - nonetheless refuse to implement it due to an insufficient indignant-user 
count (and a reasonable suspicion of seeing a higher indignant-user count if we 
did).

In practice, I believe most linkers will coalesce common string values within a 
linkage unit, which is all that even the few people who care about this 
actually want.

John.
_______________________________________________
cxx-abi-dev mailing list
[email protected]
http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev

Reply via email to