Re: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v5]

Christian Hagedorn Tue, 29 Mar 2022 23:42:25 -0700

On Wed, 30 Mar 2022 06:35:27 GMT, Christian Hagedorn <chaged...@openjdk.org> 
wrote:


>> Hi Christian,
>> 
>> this is impressive work. It's a big change, and I had a look at part of it. 
>> I'll continue tomorrow.
>> 
>> In general, I'm concerned with the use of both UL and ResourceArea in this 
>> code. I know the use of UL has been discussed, but still. Use of RA will 
>> prevent us from getting useful callstacks if we crash and Thread::current is 
>> NULL or invalid. I'd feel better if we were to consistently rely on an 
>> outside scratch buffer (like we usually do in error reporting). Even raw 
>> ::malloc would be better IMHO.
>> 
>> Another concern was safety, since this is a potential attack vector with 
>> manipulated Dwarf files, if someone manages to provove a crash. Maybe far 
>> fetched, but still. Would be good to get SonarCloud readings for this code 
>> e.g.
>> 
>> More remarks inline.
>> 
>> Cheers, Thomas
>
>> this is impressive work. It's a big change, and I had a look at part of it. 
>> I'll continue tomorrow.
> 
> Thanks a lot Thomas for your careful review! I'm in the process of working 
> through your comments and will come back with an update today or later this 
> week.
> 
>> In general, I'm concerned with the use of both UL and ResourceArea in this 
>> code. I know the use of UL has been discussed, but still. 
> 
> I agree that it is problematic but I think it would be good to keep some 
> logging around when later coming back to the parser code (and that's the only 
> reason I think that you ever want to turn these logs on). I can currently 
> think of two options:
> 
> - Leave UL in and just guard it with an additional new develop flag to 
> exclude the logs from unfiltered UL logging. This would allow us to kinda 
> accept the risks for debugging purposes. That's not really a good design 
> though but we could keep the log levels with their time stamps.
> - Replace all UL calls with `tty` and also guard them with a new develop flag 
> and play around with `Verbose` and `WizardMode` to keep the different log 
> levels. That's not great either but I think it's safer to use and we only 
> want the logs on rare occasions anyways - so it might be acceptable to use 
> these verbose flags even though we should generally get away from them.
> 
>> Use of RA will prevent us from getting useful callstacks if we crash and 
>> Thread::current is NULL or invalid. I'd feel better if we were to 
>> consistently rely on an outside scratch buffer (like we usually do in error 
>> reporting). Even raw ::malloc would be better IMHO.
> 
> The idea of a scratch buffer sounds good. I'll check if I can replace all the 
> `NEW_RESOURCE_ARRAY` usages with it.
>  
>> Another concern was safety, since this is a potential attack vector with 
>> manipulated Dwarf files, if someone manages to provove a crash. Maybe far 
>> fetched, but still. Would be good to get SonarCloud readings for this code 
>> e.g.
> 
> I was also concerned about that and I'm very thankful that you've spotted 
> some issues already! I think minimizing the risk of a potential attack should 
> be a top priority. We should definitely add some more checks. What do you 
> think about the usage of `_JVM_DWARF_PATH` to load a DWARF file? I'm not sure 
> how safe it is. I originally had it enabled for debug builds only.
> 
>> We see test errors on Linux ppcle and x64 in gtests:
> 
> Could you try running it with `-Xlog:dwarf=info/debug` in order to find out 
> why it failed? It might not have found the symbols. Is the JTreg test 
> `TestDwarf.java` working? But there is now another problem that since using 
> GCC 11.2 (change done for Oracle builds with 
> [JDK-8283057](https://bugs.openjdk.java.net/browse/JDK-8283057)), it emits 
> unsupported DWARF 5 for some DWARF sections, at least on my machine, which is 
> unfortunate. Maybe that's also the reason you see the failures if you use GCC 
> 11.2. Maybe we can mitigate this problem by forcing GCC to use DWARF 4 for 
> now. Could that be done by using the `-gdwarf` GCC flag? @erikj79
> 
>> We also see Problems in runtime/ErrorHandling and in 
>> jfr/jvm/TestDumpOnCrash. Mostly, these tests now have much longer runtimes 
>> (about factor 2). With TestDumpOnCrash, both the error file writer and the 
>> test itself timeouted on some of our slower machines.
> 
> Are these timeouts on ppcle and x64? We could also try to add 
> `-Xlog:dwarf=info/debug` to the runs to get some rough idea of the time 
> required to parse DWARF. I'll have a look at the these tests.
> 
> Thanks,
> Christian

> @chhagedorn
> 
> > There is still some code that could be shared though like opening a DWARF 
> > file with its checks or reading an LEB 128 etc. Might be worth to 
> > investigate further if the two implementations can be merged/reused to some 
> > extent. But I propose to file a separate RFE for that. What do you think?
> 
> Yeah, let's investigate about it in another RFE.
> 
> IMHO we can share some codes about DWARF between HotSpot and SA, and also we 
> might need DWARF-based call frame parser in HotSpot because some 3rd-party 
> native libraries don't use base pointer (RBP) to store SP due to 
> optimization. In SA side, it would be useful if we can check native source 
> file and line number in mixed jstack with your change.

I see, then it makes sense to unify these parsers later.

> 
> So I want to unify DWARF parser (processor) between HotSpot and SA, but it 
> might be long journey... thus I agree with you to file it as another RFE.

Sounds good, I'll file an RFE and link it to this RFE.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7126

Re: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v5]

Reply via email to