https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109164
Bug ID: 109164
Summary: aarch64 thread_local initialization error with
-ftree-pre and -foptimize-sibling-calls
Product: gcc
Version: 12.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: loganh at synopsys dot com
Target Milestone: ---
Created attachment 54687
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54687&action=edit
Bash script that reproduces the issue
With -ftree-pre, -foptimize-sibling-calls, and -O1 enabled, on
aarch64-linux-gnu, GCC 12.1.0 can generate code to access parts of thread_local
variables before the corresponding TLS init function is called if the variable
is accessed from a different TU than the variable is defined in. This
reordering could likely cause a number of different issues, but the one that
I've run into is that:
- When the compiler generates code to call a virtual function on a reference to
a to a global thread_local instance of an object defined in a different
translation unit, and
- The function calls itself in at least once branch,
the address of the object is fetched from TLS before it's initialized, and
when the vtable lookup is attempted on that object to call the virtual function
the program segfaults.
Here's an example of the kind of code that will trip it up:
struct Struct {
virtual void virtual_func();
};
extern thread_local Struct& thread_local_ref;
bool other_func(void);
bool test_func(void) {
thread_local_ref.virtual_func();
return other_func() && test_func();
}
When this is compiled (on aarch64-linux-gnu, with -O1 and -ftree-pre and
-foptimize-sibling-calls) to an object file and then dumped with objdump -C -d,
this is the code produced:
0000000000000000 <test_func()>:
0: a9be7bfd stp x29, x30, [sp, #-32]!
4: 910003fd mov x29, sp
8: a90153f3 stp x19, x20, [sp, #16]
c: 90000000 adrp x0, 0 <thread_local_ref>
10: f9400000 ldr x0, [x0]
14: d53bd041 mrs x1, tpidr_el0
18: f8606834 ldr x20, [x1, x0]
1c: 90000013 adrp x19, 0 <TLS init function for thread_local_ref>
20: f9400273 ldr x19, [x19]
24: b4000053 cbz x19, 2c <test_func()+0x2c>
28: 94000000 bl 0 <TLS init function for thread_local_ref>
2c: f9400280 ldr x0, [x20]
30: f9400001 ldr x1, [x0]
34: aa1403e0 mov x0, x20
38: d63f0020 blr x1
3c: 94000000 bl 0 <other_func()>
40: 12001c00 and w0, w0, #0xff
44: 35ffff00 cbnz w0, 24 <test_func()+0x24>
48: a94153f3 ldp x19, x20, [sp, #16]
4c: a8c27bfd ldp x29, x30, [sp], #32
50: d65f03c0 ret
Looking at addresses 0x14 through 0x18, you can see that the address of
'thread_local_ref' is read from the TLS block for the thread; the first time
this function is called, this will result in register x20 containing zero,
since the TLS block isn't intialized until the function call at 0x28. Directly
after that, at location 0x2c, a read is attempted from the address in register
x20 (zero) causing a segfault. Without -ftree-pre and -foptimize-sibling calls,
and without letting `test_func` call itself on at least one path, the code to
get the address of `thread_local_ref` is generated after the TLS init call, so
the problem does not occur.
I've attached a script that will reproduce what I've shown here, as well as
demonstrate the issue in action with a full executable that will produce the
segfault I've described.