On Tue, 8 Feb 2022 08:17:17 GMT, Christian Hagedorn <chaged...@openjdk.org> wrote:
>> When printing the native stack trace on Linux (mostly done for hs_err >> files), it only prints the method with its parameters and a relative offset >> in the method: >> >> Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, >> free space=1020k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native >> code) >> V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 >> V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, >> bool, DirectiveSet*)+0xec >> V [libjvm.so+0x8303ef] >> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 >> V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df >> V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, >> JavaThread*)+0x69 >> V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d >> V [libjvm.so+0x12091c9] JavaThread::run()+0x167 >> V [libjvm.so+0x1206ada] Thread::call_run()+0x180 >> V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f >> >> This makes it sometimes difficult to see where exactly the methods were >> called from and sometimes almost impossible when there are multiple >> invocations of the same method within one method. >> >> This patch improves this by providing source information (filename + line >> number) to the native stack traces on Linux similar to what's already done >> on Windows (see >> [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): >> >> Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, >> free space=1020k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native >> code) >> V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 >> (c1_Compilation.cpp:607) >> V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, >> bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) >> V [libjvm.so+0x8303ef] >> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 >> (compileBroker.cpp:2291) >> V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df >> (compileBroker.cpp:1966) >> V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, >> JavaThread*)+0x69 (compilerThread.cpp:59) >> V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d >> (thread.cpp:1297) >> V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) >> V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) >> V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f >> (os_linux.cpp:705) >> >> For Linux, we need to parse the debug symbols which are generated by GCC in >> DWARF - a standardized debugging format. This patch adds support for DWARF >> 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with >> x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still >> experimental and not generated for HotSpot. However, newer GCC version may >> soon generate DWARF 5 by default in which case this parser either needs to >> be extended or the build of HotSpot configured to only emit DWARF 4. >> >> The code follows the parsing steps described in the official DWARF 4 spec: >> https://dwarfstd.org/doc/DWARF4.pdf >> I added references to the corresponding sections throughout the code. >> However, I tried to explain the steps from the DWARF spec directly in the >> code (method names, comments etc.). This allows to follow the code without >> the need to actually deep dive into the spec. >> >> The comments at the `Dwarf` class in the `elf.hpp` file explain in more >> detail how a DWARF file is structured and how the parsing algorithm works to >> get to the filename and line number information. There are more class >> comments throughout the `elf.hpp` file about how different DWARF sections >> are structured and how the parsing algorithm needs to fetch the required >> information. Therefore, I will not repeat the exact workings of the >> algorithm here but refer to the code comments. I've tried to add as much >> information as possible to improve the readability. >> >> Generally, I've tried to stay away from adding any assertions as this code >> is almost always executed when already processing a VM error. Instead, the >> DWARF parser aims to just exit gracefully and possibly omit source >> information for a stack frame instead of risking to stop writing the hs_err >> file when an assertion would have failed. To debug failures, `-Xlog:dwarf` >> can be used with `info`, `debug` or `trace` which provides logging messages >> throughout parsing. >> >> **Testing:** >> Apart from manual testing, I've added two kinds of tests: >> - A JTreg test: Spawns new VMs to let them crash in various ways. The test >> reads the created hs_err files to check if the DWARF parsing could correctly >> find the filename and line number. For normal HotSpot files, I could not >> check against hardcoded filenames and line numbers as they are subject to >> change (especially line number can quickly become different). I therefore >> just added some sanity checks in the form of "found a non-empty file" and >> "found a non-zero line number". On top of that, I added tests that let the >> VM crash in custom C files (which will not change). This enables an >> additional verification of hardcoded filenames and line numbers. >> - Gtests: Directly calling the `get_source()` method which initiates DWARF >> parsing. Tested some special cases, for example, having a buffer that is not >> big enough to store the filename. >> >> On top of that, there are also existing JTreg tests that call >> `-XX:NativeMemoryTracking=detail` which will print a native stack trace with >> the new source information. These tests were also run as part of the >> standard tier testing and can be considered as sanity tests for this >> implementation. >> >> To make tests work in our infrastructure or if some other setups want to >> have debug symbols at different locations, I've added support for an >> additional `_JVM_DWARF_PATH` environment variable. This variable can >> specify a path from which the DWARF symbol file should be read by the parser >> if the default locations do not contain debug symbols (required some `make` >> changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. >> The JTreg test, however, also works if there are no symbols available. In >> that case, the test just skips all the assertion checks for the filename and >> line number. >> >> I haven't run any specific performance testing as this new code is mainly >> executed when an error will exit the VM and only if symbol files are >> available (which is normally not the case when using Java release builds as >> a user). >> >> Special thanks to @tschatzl for giving me some pointers to start based on >> his knowledge from a DWARF 2 parser he once wrote in Pascal and for >> discussing approaches on how to retrieve the source information and to >> @erikj79 for providing help for the changes required for `make`! >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one > additional commit since the last revision: > > Make dwarf tag NOT_PRODUCT First pass, did not dive into details of the state machine yet. src/hotspot/share/utilities/elfFile.cpp line 319: > 317: } > 318: log_develop_info(dwarf)("No separate .debuginfo file for library %s. > It already contains the required DWARF sections.", _filepath); > 319: _dwarf_file = new (std::nothrow) DwarfFile(_filepath); Would it be useful to explicitly bail out on a `nullptr` value here to avoid crashes below? src/hotspot/share/utilities/elfFile.cpp line 357: > 355: } > 356: > 357: strcpy(debug_pathname, _filepath); I'm always a bit uneasy using "raw" `strcpy` instead of `strncpy` and friends. The code seems to be correct though. src/hotspot/share/utilities/elfFile.cpp line 358: > 356: > 357: strcpy(debug_pathname, _filepath); > 358: char* last_slash = strrchr(debug_pathname, '/'); It's probably no big issue hardcoding the forward slash here instead of using `os::file_separator()` in this method. src/hotspot/share/utilities/elfFile.cpp line 407: > 405: bool ElfFile::load_dwarf_file_from_env_path_folder(const char* env_path, > const char* folder, const char* debug_filename, const uint32_t crc) { > 406: char* debug_pathname = NEW_RESOURCE_ARRAY(char, strlen(env_path) + > strlen(folder) + strlen(debug_filename) + 2); > 407: strcpy(debug_pathname, env_path); Similar to other resource allocations, this should bail out if the result is `nullptr`. src/hotspot/share/utilities/elfFile.cpp line 566: > 564: // > http://sourceware.org/gdb/current/onlinedocs/gdb/Separate-Debug-Files.html#Separate-Debug-Files. > 565: uint32_t ElfFile::gnu_debuglink_crc32(uint32_t crc, uint8_t* buf, const > size_t len) { > 566: crc = ~crc & 0xffffffff; The masks are unnecessary here but don't hurt. Feel free to keep. src/hotspot/share/utilities/elfFile.cpp line 576: > 574: log_develop_info(dwarf)("Open DWARF file: %s", filepath); > 575: _dwarf_file = new (std::nothrow) DwarfFile(filepath); > 576: if (!_dwarf_file->is_valid_dwarf_file()) { This should bail out if the `new` returned a `nullptr`. src/hotspot/share/utilities/elfFile.cpp line 686: > 684: } > 685: > 686: // We must align to twice the address size. Since alignment is based on address size? I.e. above, at the check whether addresses are correct, define address size and then multiply by 2 here. This would also make the condition above look nicer, i.e. move the `[NOT_]LP64` outside of the condition. src/hotspot/share/utilities/elfFile.cpp line 784: > 782: } > 783: > 784: if (!_reader.read_byte(&_header._address_size) || > NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)) > { Since this is the second time for the clause `|| NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)` maybe it is useful to make a constant out of the accepted address size somewhere instead of repeating this over and over. It's value could even be something like `sizeof(intptr_t)` or so. src/hotspot/share/utilities/elfFile.cpp line 814: > 812: log_develop_trace(dwarf)("Series of declarations [code, tag]:"); > 813: AbbreviationDeclaration declaration; > 814: bool found_matching_declaration = false; This variable is never used. Remove. src/hotspot/share/utilities/elfFile.cpp line 944: > 942: #else > 943: _reader.move_position(8); > 944: #endif Use `AddressSize` or similar here instead of the `#ifdef`. src/hotspot/share/utilities/elfFile.cpp line 1026: > 1024: break; > 1025: } else { > 1026: if (!_reader.move_position(4)) { Instead of hardcoding the `4` for lineptr/loclistptr/macptr/rangelistptr it would be nice to have a "DwarfOffset` constant of that value, since we only support 32 bit DWARF. src/hotspot/share/utilities/elfFile.cpp line 1070: > 1068: // reason, GCC is currently using version 3 as specified in the > DWARF 3 spec for the line number program even though GCC should > 1069: // be using version 4 for DWARF 4 as it emits DWARF 4 by default. > 1070: return false; According to the specification (pg112): > `version (uhalf)` > A version number (see Appendix F). This number is specific to the line number > information > and is independent of the DWARF version number. So this is just fine - actually things may break if the code accepted version 4 here assuming that there are breaking differences. On the other hand Appendix F mentions that DWARF4 contains .debug_line information in version 4. src/hotspot/share/utilities/elfFile.cpp line 1121: > 1119: // _debug_line_offset + 10 (=sizeof(_unit_length) + sizeof(_version) > + sizeof(_header_length)) + _header_length. > 1120: _header._file_names_offset = _reader.get_position(); > 1121: if (!_reader.set_position(shdr.sh_offset + _debug_line_offset + 10 + > _header._header_length)) { I would prefer a constant for this magic `10`. Thank you for the documentation. src/hotspot/share/utilities/elfFile.hpp line 211: > 209: > 210: // Load the DWARF file (.debuginfo) that belongs to this file. > 211: bool load_dwarf_file(); It would be nice to summarize from which places this methods tries to load the debug info to prevent the need for digging for it in the method implementation. src/hotspot/share/utilities/elfFile.hpp line 300: > 298: * - debug: Prints the results of the steps (1) - (4) together with the > generated line information matrix. > 299: * - trace: Full logging information for intermediate states/results > when parsing the DWARF file. > 300: */ Maybe add a comment that log output is only supported in non-product builds and the reason. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7126