On Wed, Apr 17, 2024 at 03:26:53PM +0200, Jan Hubicka wrote:
> >
> > I've tried to see what actually happens during linking without LTO, so
> > compiled
> > pr113208_0.C with -O1 -fkeep-inline-functions -std=c++20 with vanilla trunk
> > (so it has those 2 separate comdats, one for C2 and one for C1), though I've
> > changed the
> > void m(k);
> > line to
> > __attribute__((noipa)) void m(k) {}
> > in the testcase, then compiled
> > pr113208_1.C with -O2 -fkeep-inline-functions -std=c++20
> > -fno-omit-frame-pointer
> > so that one can clearly differentiate from where the implementation was
> > picked and finally added
> > template <typename _Tp> struct _Vector_base {
> > int g() const;
> > _Vector_base(int, int);
> > };
> >
> > struct QualityValue;
> > template <>
> > _Vector_base<QualityValue>::_Vector_base(int, int) {}
> > template <>
> > int _Vector_base<QualityValue>::g() const { return 0; }
> > int main () {}
> > If I link this, I see _ZN6vectorI12QualityValueEC2ERKS1_ and
> > _ZN6vectorI12QualityValueEC1ERKS1_ as separate functions with the
> > omitted frame pointer bodies, so clearly the pr113208_0.C versions prevailed
> > in both cases. It is unclear why that isn't the case for LTO.
>
> I think it is because of -fkeep-inline-functions which makes the first
> object file to define both symbols, while with LTO we optimize out one
> of them.
>
> So to reproduce same behaviour with non-LTO we would probably need use
> -O1 and arrange the contructor to be unilinable instead of using
> -fkeep-inline-functions.
Ah, you're right.
If I compile (the one line modified) pr113208_0.C with
-O -fno-early-inlining -fdisable-ipa-inline -std=c++20
it does have just _ZN6vectorI12QualityValueEC2ERKS1_ in
_ZN6vectorI12QualityValueEC2ERKS1_
comdat and no _ZN6vectorI12QualityValueEC1ERKS1_
and pr113208_1.C with -O -fno-early-inlining -fdisable-ipa-inline -std=c++20
-fno-omit-frame-pointer
and link that together with the above mentioned third *.C file, I see
000000000040112a <_ZN6vectorI12QualityValueEC2ERKS1_>:
40112a: 53 push %rbx
40112b: 48 89 fb mov %rdi,%rbx
40112e: 48 89 f7 mov %rsi,%rdi
401131: e8 9c 00 00 00 call 4011d2
<_ZNK12_Vector_baseI12QualityValueE1gEv>
401136: 89 c2 mov %eax,%edx
401138: be 01 00 00 00 mov $0x1,%esi
40113d: 48 89 df mov %rbx,%rdi
401140: e8 7b 00 00 00 call 4011c0
<_ZN12_Vector_baseI12QualityValueEC1Eii>
401145: 5b pop %rbx
401146: c3 ret
i.e. the C2 prevailing from pr113208_0.s where it is the only symbol, and
0000000000401196 <_ZN6vectorI12QualityValueEC1ERKS1_>:
401196: 55 push %rbp
401197: 48 89 e5 mov %rsp,%rbp
40119a: 53 push %rbx
40119b: 48 83 ec 08 sub $0x8,%rsp
40119f: 48 89 fb mov %rdi,%rbx
4011a2: 48 89 f7 mov %rsi,%rdi
4011a5: e8 28 00 00 00 call 4011d2
<_ZNK12_Vector_baseI12QualityValueE1gEv>
4011aa: 89 c2 mov %eax,%edx
4011ac: be 01 00 00 00 mov $0x1,%esi
4011b1: 48 89 df mov %rbx,%rdi
4011b4: e8 07 00 00 00 call 4011c0
<_ZN12_Vector_baseI12QualityValueEC1Eii>
4011b9: 48 8b 5d f8 mov -0x8(%rbp),%rbx
4011bd: c9 leave
4011be: c3 ret
which is the C1 alias originally aliased to C2 in C5 comdat.
So, that would match linker behavior where it sees C1 -> C2 alias prevails,
but a different version of C2 prevails, so let's either make C1 a non-alias
or alias to a non-exported symbol or something like that.
Though, I admit I have no idea what we do with comdat's during LTO, perhaps
doing what I said above could break stuff if linker after seeing the LTO
resulting objects decides on prevailing symbols differently.
Jakub