https://bugs.kde.org/show_bug.cgi?id=460951

            Bug ID: 460951
           Summary: infinite loop in ARM-64 version of instrumentation
                    with ouptput VG_ calls at superblock and instruction
                    level
    Classification: Developer tools
           Product: valgrind
           Version: 3.19.0
          Platform: unspecified
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: vex
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug
symbols.
See
https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***
I found a bug in the ARM64 version of valgrind (in both versions 3.16.1 and
3.19.0) that causes an infinite loop in some instrumentation code. The lackey
tool is one example that produces this bug. The bug is reproducible in the
lackey tool on ARM64 by running: valgrind --tool=lackey --trace-superblocks=yes
./a.out  
I can reproduce on every example C program I've tried, even the most simple
(for example: int main(int argc, char *argv[]) { int x;  x = 6; return 0; }
triggers it)).

The bug is getting stuck on repeating the same two superblocks over and over
again in an infinite loop.  I suspect it is a bug with getting the correct
return address when instrumenting at the  granularity of superblocks (and of
individual instructions), or it is more specifically not getting the right
return address when there are calls to certain functions in the instrumentation
(specifically to VG_printf,  to other VG_ output functions in certain cases
(described more below in the ADDITIONAL INFORMATION section), to
VG_message_flush, and possibly others).  This is not a bug in the x86 versions
of  the lackey tool's superblock tracing.

STEPS TO REPRODUCE
1.  compile with debugging: gcc -g prog.c       # (and other gcc command line
options tried listed in ADDITIONAL INFORMATION)
2. run lackey with trace-superblocks option:  valgrind --tool=lackey
--trace-superblocks=yes ./a.out  

OBSERVED RESULT

infinite loop  of same two superblocks (always SB 04954ecc and SB 04954ed8 on
my system) repeated in VEX instrumented code on ARM-64

EXPECTED RESULT

instrumentation would not get into infinite loop and program would complete
tracing through all its superblocks until completion  (the a.out does not
itself have an infinite loop)

SOFTWARE/OS VERSIONS

Linux: Linux 5.15.69-rockchip64 #22.08.2 SMP PREEMPT Wed Sep 21 19:28:26 UTC
2022 aarch64 GNU/Linux
gcc: gcc (Debian 10.2.1-6) 
processor: ARM  v8.4
valgrind:  3.19.0 built from source (also occurs on debian installed version
3.16.1) 

ADDITIONAL INFORMATION

I've done some experimentation with lackey code, and this is what I've
discovered about what more specifically seems to trigger the bug:
* Calling VG_printf in the instrumentation function always causes this problem. 
* Calls to VG_emit, VG_message, VG_umsg work if the format string does not
contain a '\n' character, but if the format string does contain `\n`, then the
instrumentation gets into this infinite loop bug. 
* Explicitly calling VG_message_flush triggers the infinite loop of
instrumentation code.  I can trigger the bug when only including one function
call at each instrumentation point.  So it is not a bug with adding more than
one call to an instrumentation function at a single instrumentation point 
(e.g. it is not calling both add_one_SB_entered and trace_superblock  that is
causing the bug in lackey, but with just a call to one of these and an added
call to VG_printf in the instrumentation function triggers the bug).  
* It is also not a problem with passing an Addr parameter to an instrumentation
function (as in trace_superblock in lackey), so is also not likely parameter
passing in general that seems problematic.
* I've also discovered that calls to VG_lseek in instrumentation code fail on
ARM (it works fine on x86).  This may be related or a different bug.

I'm trying to write a valgrind tool that instruments at the instruction-level. 
My valgrind tool works fine for x86, but has this infinite loop issue on ARM-64
in a similar way to lackey's.  

I've also tried compiling with these different gcc flags, and all trigger the
bug:
* gcc -g
* gcc -ggdb
* gcc -O0 -ggdb -fno-omit-frame-pointer
* gcc -Wall -ggdb -O0 -fno-asynchronous-unwind-tables -fno-dwarf2-cfi-asm
-fno-pic -no-pie -fno-omit-frame-pointer

I don't know an easy way to debug valgrind instrumented code at runtime, so I
have not looked into this further, but I'd really like to use this
functionality in a valgrind tool I'm building (again, it works fine on x86, but
has this bug on ARM).  Perhaps the problem is with some call optimization with
code (perhaps specific to system call code (like write calling a function to
flush that could be tail call optimized?)) and valgrind ARM instrumentation
code not finding the right return address value  and getting into an infinite
loop. 

I'm hoping someone can fix this bug (my guess is it is somewhere in the VEX
code for ARM, and something about return addresses in VG_ functions that make
system calls, but this is a guess).

Thank you for your help!

system/SW version details:

$cat /proc/cpuinfo...processor  : 5
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 2

$ inst/bin/valgrind --version     # version I built from source
valgrind-3.19.0
$ valgrind --version    # system installed version as part of debian install
valgrind-3.16.1

$  uname -a
Linux 5.15.69-rockchip64 #22.08.2 SMP PREEMPT Wed Sep 21 19:28:26 UTC 2022
aarch64 GNU/Linux

$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to