Issue 160968
Summary [lld][AArch64][PGO] Deadlock in writeSections with PGO on AArch64
Labels lld
Assignees
Reporter gulfemsavrun
    We ran into this issue during a multi-stage build where we built our toolchain with PGO. The hang happens specifically after we build Clang with PGO enabled, and then use that PGO-optimized Clang to build the compiler-rt. The linker stalls when compiling some runtime tests like:
https://github.com/llvm/llvm-project/blob/597f93d36b035faeb63f4ba0d61a8b8e25eddaab/compiler-rt/cmake/config-ix.cmake#L98

I attached `gdb` to the stalled `lld` process and found the following call stack:
```
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/aarch64/syscall_cancel.S:50
#1 0x0000ffffa4a22658 in __internal_syscall_cancel (a1=a1@entry=281474580353104, a2=a2@entry=393, a3=a3@entry=0, a4=a4@entry=0, a5=a5@entry=0, a6=a6@entry=4294967295, nr=nr@entry=98) at ./nptl/cancellation.c:49
#2 0x0000ffffa4a229f0 in __futex_abstimed_wait_common64 (private=0, futex_word=0xffffe8601050, expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
#3  __futex_abstimed_wait_common (futex_word=0xffffe8601050, expected=0, clockid=0, abstime=0x0, private=0, cancel=true) at ./nptl/futex-internal.c:87
#4 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0xffffe8601050, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0)
    at ./nptl/futex-internal.c:139
#5 0x0000ffffa4a253ec in __pthread_cond_wait_common (cond=0xffffe8601030, mutex=0xffffe8601000, clockid=0, abstime=0x0) at ./nptl/pthread_cond_wait.c:426
#6  ___pthread_cond_wait (cond=0xffffe8601030, mutex=0xffffe8601000) at ./nptl/pthread_cond_wait.c:458
#7  0x0000aaaaed0f8ea4 in std::__2::__libcpp_condvar_wait[abi:ne220000](pthread_cond_t*, pthread_mutex_t*) (__cv=0xffffe8601050, __m=0x189)
    at /home/gulfem/llvm-project-gulfem/libcxx/include/__thread/support/pthread.h:122
#8 std::__2::condition_variable::wait (this=0xffffe8601050, lk=...) at ../../../llvm-project-gulfem/libcxx/src/condition_variable.cpp:36
#9 0x0000aaaaecfcfa3c in std::__2::condition_variable::wait[abi:nn220000]<llvm::parallel::detail::Latch::sync() const::{lambda()#1}>(std::__2::unique_lock<std::__2::mutex>&, llvm::parallel::detail::Latch::sync() const::{lambda()#1}) (this=0xffffe8601030, __lk=..., __pred=...) at /home/gulfem/clang-prod-linux-build-relwithdebuginfo/./bin/../include/c++/v1/__condition_variable/condition_variable.h:112
#10 llvm::parallel::detail::Latch::sync (this=0xffffe8600ff8) at llvm/include/llvm/Support/Parallel.h:85
#11 llvm::parallel::TaskGroup::~TaskGroup (this=0xffffe8600ff8) at ../../../../llvm-project-gulfem/llvm/lib/Support/Parallel.cpp:191
#12 0x0000aaaae45d6b50 in (anonymous namespace)::Writer<llvm::object::ELFType<(llvm::endianness)1, true> >::writeSections (this=0xffffe8600ea8) at ../../../../llvm-project-gulfem/lld/ELF/Writer.cpp:3026
#13 (anonymous namespace)::Writer<llvm::object::ELFType<(llvm::endianness)1, true> >::run (this=0xffffe8600ea8) at ../../../../llvm-project-gulfem/lld/ELF/Writer.cpp:376
#14 lld::elf::writeResult<llvm::object::ELFType<(llvm::endianness)1, true> > (ctx=...) at ../../../../llvm-project-gulfem/lld/ELF/Writer.cpp:100
#15 0x0000aaaae42f1c6c in lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::endianness)1, true> > (this=<optimized out>, this@entry=0xaaab2a1035c0, args=...)
    at ../../../../llvm-project-gulfem/lld/ELF/Driver.cpp:3503
#16 0x0000aaaae42ce1f4 in lld::elf::LinkerDriver::linkerMain (this=<optimized out>, this@entry=0xaaab2a1035c0, argsArr=...) at ../../../../llvm-project-gulfem/lld/ELF/Driver.cpp:729
#17 0x0000aaaae42caba4 in lld::elf::link (args=..., stdoutOS=..., stderrOS=..., exitEarly=<optimized out>, disableOutput=<optimized out>) at ../../../../llvm-project-gulfem/lld/ELF/Driver.cpp:140
#18 0x0000aaaae418d010 in lld::unsafeLldMain (args=..., stdoutOS=..., stderrOS=..., drivers=..., exitEarly=true) at ../../../../llvm-project-gulfem/lld/Common/DriverDispatcher.cpp:163
#19 0x0000aaaae3bcd3d0 in lld_main (argc=argc@entry=32, argv=argv@entry=0xffffe8602f38) at ../../../../llvm-project-gulfem/lld/tools/lld/lld.cpp:90
#20 0x0000aaaae418af4c in findTool (Argc=32, Argv=0xffffe8602f38, Argv0=0xffffe8603e8c "/home/gulfem/clang-prod-linux-build-relwithdebuginfo/tools/clang/stage2-instrumented-bins/./bin/ld.lld")
 at ../../../../llvm-project-gulfem/llvm/tools/llvm-driver/llvm-driver.cpp:68
#21 0x0000aaaae4185830 in main (Argc=32, Argv=0xffffe8602f38) at ../../../../llvm-project-gulfem/llvm/tools/llvm-driver/llvm-driver.cpp:85
```

I can sometimes reproduce the hang when running tests like the ones mentioned, but the issue is not consistent:
```
[1/2] /home/gulfem/clang-prod-linux-build-relwithdebuginfo/tools/clang/stage2-instrumented-bins/./bin/clang --target=x86_64-unknown-fuchsia --sysroot=/home/gulfem/fuchsia-idk/arch/x64/sysroot -DCOMPILER_RT_HAS_Z_TEXT --target=x86_64-unknown-fuchsia -I/home/gulfem/fuchsia-idk/pkg/sync/include -I/home/gulfem/fuchsia-idk/pkg/fdio/include -fPIC -fno-semantic-interposition -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/home/gulfem/clang-prod-linux-build-relwithdebuginfo/tools/clang/stage2-instrumented-bins/runtimes/runtimes-x86_64-unknown-fuchsia-bins=../../../../../../llvm-project-gulfem -ffile-prefix-map=/home/gulfem/llvm-project-gulfem/= -no-canonical-prefixes --start-no-unused-arguments --unwindlib=none --end-no-unused-arguments -nostdlib++ -nostdinc++ -nodefaultlibs -Wl,-z,text -MD -MT CMakeFiles/cmTC_f3b74.dir/src.c.obj -MF CMakeFiles/cmTC_f3b74.dir/src.c.obj.d -o CMakeFiles/cmTC_f3b74.dir/src.c.obj -c /home/gulfem/clang-prod-linux-build-relwithdebuginfo/tools/clang/stage2-instrumented-bins/runtimes/runtimes-x86_64-unknown-fuchsia-bins/CMakeFiles/CMakeScratch/TryCompile-gy54Ab/src.c
clang: warning: -Wl,-z,text: 'linker' input unused [-Wunused-command-line-argument]
[2/2] : && /home/gulfem/clang-prod-linux-build-relwithdebuginfo/tools/clang/stage2-instrumented-bins/./bin/clang --target=x86_64-unknown-fuchsia --sysroot=/home/gulfem/fuchsia-idk/arch/x64/sysroot --target=x86_64-unknown-fuchsia -I/home/gulfem/fuchsia-idk/pkg/sync/include -I/home/gulfem/fuchsia-idk/pkg/fdio/include -fPIC -fno-semantic-interposition -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/home/gulfem/clang-prod-linux-build-relwithdebuginfo/tools/clang/stage2-instrumented-bins/runtimes/runtimes-x86_64-unknown-fuchsia-bins=../../../../../../llvm-project-gulfem -ffile-prefix-map=/home/gulfem/llvm-project-gulfem/= -no-canonical-prefixes --start-no-unused-arguments --unwindlib=none --end-no-unused-arguments -nostdlib++ -nostdinc++ -nodefaultlibs -Wl,-z,text -L/home/gulfem/fuchsia-idk/arch/x64/lib -fuse-ld=lld CMakeFiles/cmTC_f3b74.dir/src.c.obj -o cmTC_f3b74  -lc && :
```

This is the source file for these tests:
```
int main(void) { return 0; }
```

This bug is specific to PGO builds on AArch64; it does not occur on x86 architectures or in non-PGO builds. 

The hang started happening after we adopted `IR PGO` instead of the `Front-End PGO` via https://github.com/llvm/llvm-project/pull/156060.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to