On Tue, 8 Feb 2022 08:17:17 GMT, Christian Hagedorn <chaged...@openjdk.org> 
wrote:

>> When printing the native stack trace on Linux (mostly done for hs_err 
>> files), it only prints the method with its parameters and a relative offset 
>> in the method:
>> 
>> Stack: [0x00007f6e01739000,0x00007f6e0183a000],  sp=0x00007f6e01838110,  
>> free space=1020k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
>> code)
>> V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64
>> V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, 
>> bool, DirectiveSet*)+0xec
>> V  [libjvm.so+0x8303ef]  
>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899
>> V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df
>> V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, 
>> JavaThread*)+0x69
>> V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d
>> V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167
>> V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180
>> V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f
>> 
>> This makes it sometimes difficult to see where exactly the methods were 
>> called from and sometimes almost impossible when there are multiple 
>> invocations of the same method within one method.
>> 
>> This patch improves this by providing source information (filename + line 
>> number) to the native stack traces on Linux similar to what's already done 
>> on Windows (see 
>> [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)):
>> 
>> Stack: [0x00007f34fca18000,0x00007f34fcb19000],  sp=0x00007f34fcb17110,  
>> free space=1020k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
>> code)
>> V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64  
>> (c1_Compilation.cpp:607)
>> V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, 
>> bool, DirectiveSet*)+0xec  (c1_Compiler.cpp:250)
>> V  [libjvm.so+0x8303ef]  
>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899  
>> (compileBroker.cpp:2291)
>> V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df  
>> (compileBroker.cpp:1966)
>> V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, 
>> JavaThread*)+0x69  (compilerThread.cpp:59)
>> V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d  
>> (thread.cpp:1297)
>> V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167  (thread.cpp:1280)
>> V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180  (thread.cpp:358)
>> V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f  
>> (os_linux.cpp:705)
>> 
>> For Linux, we need to parse the debug symbols which are generated by GCC in 
>> DWARF - a standardized debugging format. This patch adds support for DWARF 
>> 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with 
>> x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still 
>> experimental and not generated for HotSpot. However, newer GCC version may 
>> soon generate DWARF 5 by default in which case this parser either needs to 
>> be extended or the build of HotSpot configured to only emit DWARF 4. 
>> 
>> The code follows the parsing steps described in the official DWARF 4 spec: 
>> https://dwarfstd.org/doc/DWARF4.pdf
>> I added references to the corresponding sections throughout the code. 
>> However, I tried to explain the steps from the DWARF spec directly in the 
>> code (method names, comments etc.). This allows to follow the code without 
>> the need to actually deep dive into the spec. 
>> 
>> The comments at the `Dwarf` class in the `elf.hpp` file explain in more 
>> detail how a DWARF file is structured and how the parsing algorithm works to 
>> get to the filename and line number information. There are more class 
>> comments throughout the `elf.hpp` file about how different DWARF sections 
>> are structured and how the parsing algorithm needs to fetch the required 
>> information. Therefore, I will not repeat the exact workings of the 
>> algorithm here but refer to the code comments. I've tried to add as much 
>> information as possible to improve the readability.
>> 
>> Generally, I've tried to stay away from adding any assertions as this code 
>> is almost always executed when already processing a VM error. Instead, the 
>> DWARF parser aims to just exit gracefully and possibly omit source 
>> information for a stack frame instead of risking to stop writing the hs_err 
>> file when an assertion would have failed. To debug failures, `-Xlog:dwarf` 
>> can be used with `info`, `debug` or `trace` which provides logging messages 
>> throughout parsing. 
>> 
>> **Testing:**
>> Apart from manual testing, I've added two kinds of tests:
>> - A JTreg test: Spawns new VMs to let them crash in various ways. The test 
>> reads the created hs_err files to check if the DWARF parsing could correctly 
>> find the filename and line number. For normal HotSpot files, I could not 
>> check against hardcoded filenames and line numbers as they are subject to 
>> change (especially line number can quickly become different). I therefore 
>> just added some sanity checks in the form of "found a non-empty file" and 
>> "found a non-zero line number". On top of that, I added tests that let the 
>> VM crash in custom C files (which will not change). This enables an 
>> additional verification of hardcoded filenames and line numbers.
>> - Gtests: Directly calling the `get_source()` method which initiates DWARF 
>> parsing. Tested some special cases, for example, having a buffer that is not 
>> big enough to store the filename.
>> 
>> On top of that, there are also existing JTreg tests that call 
>> `-XX:NativeMemoryTracking=detail` which will print a native stack trace with 
>> the new source information. These tests were also run as part of the 
>> standard tier testing and can be considered as sanity tests for this 
>> implementation.
>> 
>> To make tests work in our infrastructure or if some other setups want to 
>> have debug symbols at different locations, I've added support for an 
>> additional  `_JVM_DWARF_PATH` environment variable. This variable can 
>> specify a path from which the DWARF symbol file should be read by the parser 
>> if the default locations do not contain debug symbols (required some `make` 
>> changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. 
>> The JTreg test, however, also works if there are no symbols available. In 
>> that case, the test just skips all the assertion checks for the filename and 
>> line number.
>> 
>> I haven't run any specific performance testing as this new code is mainly 
>> executed when an error will exit the VM and only if symbol files are 
>> available (which is normally not the case when using Java release builds as 
>> a user).
>> 
>> Special thanks to @tschatzl for giving me some pointers to start based on 
>> his knowledge from a DWARF 2 parser he once wrote in Pascal and for 
>> discussing approaches on how to retrieve the source information and to 
>> @erikj79 for providing help for the changes required for `make`!
>>  
>> Thanks,
>> Christian
>
> Christian Hagedorn has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Make dwarf tag NOT_PRODUCT

First pass, did not dive into details of the state machine yet.

src/hotspot/share/utilities/elfFile.cpp line 319:

> 317:     }
> 318:     log_develop_info(dwarf)("No separate .debuginfo file for library %s. 
> It already contains the required DWARF sections.", _filepath);
> 319:     _dwarf_file = new (std::nothrow) DwarfFile(_filepath);

Would it be useful to explicitly bail out on a `nullptr` value here to avoid 
crashes below?

src/hotspot/share/utilities/elfFile.cpp line 357:

> 355:   }
> 356: 
> 357:   strcpy(debug_pathname, _filepath);

I'm always a bit uneasy using "raw" `strcpy` instead of `strncpy` and friends. 
The code seems to be correct though.

src/hotspot/share/utilities/elfFile.cpp line 358:

> 356: 
> 357:   strcpy(debug_pathname, _filepath);
> 358:   char* last_slash = strrchr(debug_pathname, '/');

It's probably no big issue hardcoding the forward slash here instead of using 
`os::file_separator()` in this method.

src/hotspot/share/utilities/elfFile.cpp line 407:

> 405: bool ElfFile::load_dwarf_file_from_env_path_folder(const char* env_path, 
> const char* folder, const char* debug_filename, const uint32_t crc) {
> 406:   char* debug_pathname = NEW_RESOURCE_ARRAY(char, strlen(env_path) + 
> strlen(folder) + strlen(debug_filename) + 2);
> 407:   strcpy(debug_pathname, env_path);

Similar to other resource allocations, this should bail out if the result is 
`nullptr`.

src/hotspot/share/utilities/elfFile.cpp line 566:

> 564: // 
> http://sourceware.org/gdb/current/onlinedocs/gdb/Separate-Debug-Files.html#Separate-Debug-Files.
> 565: uint32_t ElfFile::gnu_debuglink_crc32(uint32_t crc, uint8_t* buf, const 
> size_t len) {
> 566:   crc = ~crc & 0xffffffff;

The masks are unnecessary here but don't hurt. Feel free to keep.

src/hotspot/share/utilities/elfFile.cpp line 576:

> 574:   log_develop_info(dwarf)("Open DWARF file: %s", filepath);
> 575:   _dwarf_file = new (std::nothrow) DwarfFile(filepath);
> 576:   if (!_dwarf_file->is_valid_dwarf_file()) {

This should bail out if the `new` returned a `nullptr`.

src/hotspot/share/utilities/elfFile.cpp line 686:

> 684:   }
> 685: 
> 686:   // We must align to twice the address size.

Since alignment is based on address size? I.e. above, at the check whether 
addresses are correct, define address size and then multiply by 2 here.
This would also make the condition above look nicer, i.e. move the `[NOT_]LP64` 
outside of the condition.

src/hotspot/share/utilities/elfFile.cpp line 784:

> 782:   }
> 783: 
> 784:   if (!_reader.read_byte(&_header._address_size) || 
> NOT_LP64(_header._address_size != 4)  LP64_ONLY( _header._address_size != 8)) 
> {

Since this is the second time for the clause `|| NOT_LP64(_header._address_size 
!= 4) LP64_ONLY( _header._address_size != 8)` maybe it is useful to make a 
constant out of the accepted address size somewhere instead of repeating this 
over and over.
It's value could even be something like `sizeof(intptr_t)` or so.

src/hotspot/share/utilities/elfFile.cpp line 814:

> 812:   log_develop_trace(dwarf)("Series of declarations [code, tag]:");
> 813:   AbbreviationDeclaration declaration;
> 814:   bool found_matching_declaration = false;

This variable is never used. Remove.

src/hotspot/share/utilities/elfFile.cpp line 944:

> 942: #else
> 943:       _reader.move_position(8);
> 944: #endif

Use `AddressSize` or similar here instead of the `#ifdef`.

src/hotspot/share/utilities/elfFile.cpp line 1026:

> 1024:         break;
> 1025:       } else {
> 1026:         if (!_reader.move_position(4)) {

Instead of hardcoding the `4` for lineptr/loclistptr/macptr/rangelistptr it 
would be nice to have a "DwarfOffset` constant of that value, since we only 
support 32 bit DWARF.

src/hotspot/share/utilities/elfFile.cpp line 1070:

> 1068:     // reason, GCC is currently using version 3 as specified in the 
> DWARF 3 spec for the line number program even though GCC should
> 1069:     // be using version 4 for DWARF 4 as it emits DWARF 4 by default.
> 1070:     return false;

According to the specification (pg112):

> `version (uhalf)`
> A version number (see Appendix F). This number is specific to the line number 
> information
> and is independent of the DWARF version number.

So this is just fine - actually things may break if the code accepted version 4 
here assuming that there are breaking differences.
On the other hand Appendix F mentions that DWARF4 contains .debug_line 
information in version 4.

src/hotspot/share/utilities/elfFile.cpp line 1121:

> 1119:   // _debug_line_offset + 10 (=sizeof(_unit_length) + sizeof(_version) 
> + sizeof(_header_length)) + _header_length.
> 1120:   _header._file_names_offset = _reader.get_position();
> 1121:   if (!_reader.set_position(shdr.sh_offset + _debug_line_offset + 10 + 
> _header._header_length)) {

I would prefer a constant for this magic `10`. Thank you for the documentation.

src/hotspot/share/utilities/elfFile.hpp line 211:

> 209: 
> 210:   // Load the DWARF file (.debuginfo) that belongs to this file.
> 211:   bool load_dwarf_file();

It would be nice to summarize from which places this methods tries to load the 
debug info to prevent the need for digging for it in the method implementation.

src/hotspot/share/utilities/elfFile.hpp line 300:

> 298:  *  - debug: Prints the results of the steps (1) - (4) together with the 
> generated line information matrix.
> 299:  *  - trace: Full logging information for intermediate states/results 
> when parsing the DWARF file.
> 300:  */

Maybe add a comment that log output is only supported in non-product builds and 
the reason.

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7126

Reply via email to