Re: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v5]

Christian Hagedorn Tue, 29 Mar 2022 23:42:15 -0700

On Mon, 28 Mar 2022 13:14:50 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:


> this is impressive work. It's a big change, and I had a look at part of it. 
> I'll continue tomorrow.

Thanks a lot Thomas for your careful review! I'm in the process of working 
through your comments and will come back with an update today or later this 
week.

> In general, I'm concerned with the use of both UL and ResourceArea in this 
> code. I know the use of UL has been discussed, but still. 

I agree that it is problematic but I think it would be good to keep some 
logging around when later coming back to the parser code (and that's the only 
reason I think that you ever want to turn these logs on). I can currently think 
of two options:

- Leave UL in and just guard it with an additional new develop flag to exclude 
the logs from unfiltered UL logging. This would allow us to kinda accept the 
risks for debugging purposes. That's not really a good design though but we 
could keep the log levels with their time stamps.
- Replace all UL calls with `tty` and also guard them with a new develop flag 
and play around with `Verbose` and `WizardMode` to keep the different log 
levels. That's not great either but I think it's safer to use and we only want 
the logs on rare occasions anyways - so it might be acceptable to use these 
verbose flags even though we should generally get away from them.

> Use of RA will prevent us from getting useful callstacks if we crash and 
> Thread::current is NULL or invalid. I'd feel better if we were to 
> consistently rely on an outside scratch buffer (like we usually do in error 
> reporting). Even raw ::malloc would be better IMHO.

The idea of a scratch buffer sounds good. I'll check if I can replace all the 
`NEW_RESOURCE_ARRAY` usages with it.
 
> Another concern was safety, since this is a potential attack vector with 
> manipulated Dwarf files, if someone manages to provove a crash. Maybe far 
> fetched, but still. Would be good to get SonarCloud readings for this code 
> e.g.

I was also concerned about that and I'm very thankful that you've spotted some 
issues already! I think minimizing the risk of a potential attack should be a 
top priority. We should definitely add some more checks. What do you think 
about the usage of `_JVM_DWARF_PATH` to load a DWARF file? I'm not sure how 
safe it is. I originally had it enabled for debug builds only.

> We see test errors on Linux ppcle and x64 in gtests:

Could you try running it with `-Xlog:dwarf=info/debug` in order to find out why 
it failed? It might not have found the symbols. Is the JTreg test 
`TestDwarf.java` working? But there is now another problem that since using GCC 
11.2 (change done for Oracle builds with 
[JDK-8283057](https://bugs.openjdk.java.net/browse/JDK-8283057)), it emits 
unsupported DWARF 5 for some DWARF sections, at least on my machine, which is 
unfortunate. Maybe that's also the reason you see the failures if you use GCC 
11.2. Maybe we can mitigate this problem by forcing GCC to use DWARF 4 for now. 
Could that be done by using the `-gdwarf` GCC flag? @erikj79

> We also see Problems in runtime/ErrorHandling and in jfr/jvm/TestDumpOnCrash. 
> Mostly, these tests now have much longer runtimes (about factor 2). With 
> TestDumpOnCrash, both the error file writer and the test itself timeouted on 
> some of our slower machines.

Are these timeouts on ppcle and x64? We could also try to add 
`-Xlog:dwarf=info/debug` to the runs to get some rough idea of the time 
required to parse DWARF. I'll have a look at the these tests.

Thanks,
Christian

-------------

PR: https://git.openjdk.java.net/jdk/pull/7126

Re: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v5]

Reply via email to