Re: RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default
On Thu, 9 May 2024 05:04:47 GMT, Thomas Stuefe wrote: >> On Linux aarch64, a JVM may encounter three different page sizes: 4K, 64K, >> and (when run on Mac M1 hardware) 16K. >> >> Since forgetting to specify `--enable-compatible-cds-alignment` is a common >> error that is only noticed when running the produced JVM on hardware with >> different page size, we propose to enable that option by default on Linux >> aarch64. The cost is a moderate increase in CDS archive size (about 300K). >> >> I tested this patch on: >> - x64 Linux >> - x64 Linux, crossbuilding to aarch64 >> - building natively on aarch64 Linux > > Does anyone else have an opinion on this matter? The question is: > > "Should we enable cds-compatible alignment in the configure options by > default generally, not only for Linux aarch64" ? > > This would, in addition to the proposed change affecting Linux on aarch64, > also affect MacOS x64. JVMs built there would be able to run, by default, in > Rosetta. At the price of sligthly increased archive size of a dozen KB or so. > > If nobody objects, I will do that since I like the simplicity of it. > @tstuefe We already have an issue to reset the default to true (raised by > @iklam) > > https://bugs.openjdk.org/browse/JDK-8331952 Okay. I see it was created yesterday evening. @iklam how do you want to handle this? Do you want to proceed with 8331952 and I close this as duplicate? Or vice versa? - PR Comment: https://git.openjdk.org/jdk/pull/19142#issuecomment-2102339405
Re: RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default
On Wed, 8 May 2024 15:14:16 GMT, Thomas Stuefe wrote: > On Linux aarch64, a JVM may encounter three different page sizes: 4K, 64K, > and (when run on Mac M1 hardware) 16K. > > Since forgetting to specify `--enable-compatible-cds-alignment` is a common > error that is only noticed when running the produced JVM on hardware with > different page size, we propose to enable that option by default on Linux > aarch64. The cost is a moderate increase in CDS archive size (about 300K). > > I tested this patch on: > - x64 Linux > - x64 Linux, crossbuilding to aarch64 > - building natively on aarch64 Linux Does anyone else have an opinion on this matter? The question is: "Should we enable cds-compatible alignment in the configure options by default generally, not only for Linux aarch64" ? This would, in addition to the proposed change affecting Linux on aarch64, also affect MacOS x64. JVMs built there would be able to run, by default, in Rosetta. At the price of sligthly increased archive size of a dozen KB or so. If nobody objects, I will do that since I like the simplicity of it. - PR Comment: https://git.openjdk.org/jdk/pull/19142#issuecomment-2101941697
Re: RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default
On Wed, 8 May 2024 17:01:26 GMT, Andrew Haley wrote: > This is obviously correct. Otherwise, the CDS archive will only work if the > JDK happens to have been built on a machine with greater or equal page size. > > Does the default `compatible-cds-alignment=false` make any sense anywhere on > other platforms, though? We might simply make compatibility the default. The configure option is also evaluated for MacOS x64 to enforce an alignment of 16K, to be able to run on MacOS aarch64 in Rosetta. Seeing how MacOS x64 is not really important anymore, and losses due to 16K alignment are small, I'd be fine with setting the default to compatible. It would be a simpler patch, certainly. - PR Comment: https://git.openjdk.org/jdk/pull/19142#issuecomment-2101138971
RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default
On Linux aarch64, a JVM may encounter three different page sizes: 4K, 64K, and (when run on Mac M1 hardware) 16K. Since forgetting to specify `--enable-compatible-cds-alignment` is a common error that is only noticed when running the produced JVM on hardware with different page size, we propose to enable that option by default on Linux aarch64. The cost is a moderate increase in CDS archive size (about 300K). I tested this patch on: - x64 Linux - x64 Linux, crossbuilding to aarch64 - building natively on aarch64 Linux - Commit messages: - aarch64-should-default-to-64k-alignment-for-cds-regions Changes: https://git.openjdk.org/jdk/pull/19142/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19142=00 Issue: https://bugs.openjdk.org/browse/JDK-8331942 Stats: 12 lines in 1 file changed: 11 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19142.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19142/head:pull/19142 PR: https://git.openjdk.org/jdk/pull/19142
Re: RFR: 8329581: Java launcher no longer prints a stack trace [v9]
On Mon, 6 May 2024 19:06:10 GMT, Sonia Zaldana Calles wrote: >> Hi folks, >> >> This PR aims to fix >> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). >> >> I think the regression got introduced in >> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). >> >> In the issue linked above, >> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226) >> got removed to simplify launcher code. >> >> Previously, we used ```getMainType``` to do the appropriate main method >> invocation in ```JavaMain```. However, we currently attempt to do all types >> of main method invocations at the same time >> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623). >> >> >> Note how all of these invocations clear the exception reported with >> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390). >> >> >> Therefore, if a legitimate exception comes up during one of these >> invocations, it does not get reported. >> >> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking >> forward to your suggestions. >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one > additional commit since the last revision: > > Fixing indentation > > Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com> Pre-existing: Man, I cannot grok the complex return code handling, tbh. We have the local `ret` variable holding a return code. We also hand codes to CHECK_EXCEPTION_LEAVE as macro argument. But we don't hand codes to CHECK_EXCEPTION_NULL_LEAVE. LEAVE uses the locally defined `ret` instead of getting the return code as argument. CHECK_EXCEPTION_LEAVE modifies the local `ret`, but CHECK_EXCEPTION_NULL_LEAVE does not. CHECK_EXCEPTION_NULL_LEAVE does not set `ret`. So, in case of an error, it would cause the launcher to return OK, but this does not happen because the local `ret` gets initialized to 1 before the first call to CHECK_EXCEPTION_NULL_LEAVE (line 566 resp. 560). Not sure if this was intentional, but it surely is very brittle. We rely on the content of `ret`, and that changes several times throughout JavaMain. CHECK_EXCEPTION_NULL_LEAVE argument is named CENL_exception, which I don't understand. To confuse matters more, the logic for internal error codes and the launcher return code is reversed: internally, 0 means error, and externally, 0 means success. And we only use numerical literals (`1`, `0`) instead of clearly named constants. This may be food for another RFE, to keep this patch minimal. But a good solution, to me, would be like this: - have the same logic for return codes (1 = error, 0 = success) to ease understanding - have clearly named constants (e.g. "LAUNCHER_OK" 0, "LAUNCHER_ERR" = 1) - have the LEAVE macro take the launcher return code as argument - have all xxx_LEAVE macros pass in LAUNCHER_ERR to LEAVE - call the final LEAVE with LAUNCHER_OK - optionally, define something like "LEAVE_ERR" and "LEAVE_OK" that call LEAVE with either LAUNCHER_ERR or LAUNCHER_OK, for more concise coding. For this patch, I think the return code logic is okay, but I would feel better if others double-checked. src/java.base/share/native/libjli/java.c line 394: > 392: if ((*env)->ExceptionOccurred(env)) { \ > 393: return 0; \ > 394: } else if (obj == NULL) { \ Side note, I first wondered if this comparison is strictly correct, since we now pass in `jmethodID` as well as `jobject`, which are opaque types and not necessarily of the same size. But seems that jmethodID==NULL is defined to mean "invalid" [1] by the spec. Requiring NULL instead of providing an opaque invalid constant feels like an odd choice in the original JNI spec, since it requires implementors to use a pointer type to implement jmethodID? Which we do, in OpenJDK [2]. [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#getstaticmethodid [2] https://github.com/openjdk/jdk/blob/2baacfc16916220846743c6e49a99a6c41cac510/src/java.base/share/native/include/jni.h#L135-L136 src/java.base/share/native/libjli/java.c line 420: > 418: jmethodID mainID = > 419: (*env)->GetMethodID(env, mainClass, "main", > "([Ljava/lang/String;)V"); > 420: CHECK_EXCEPTION_NULL_FAIL(mainID); Is there a particular reason why you moved this section up here, from line 432 before? If not, I'd restore it to its original position to keep the diff small. src/java.base/share/native/libjli/java.c line 452: > 450: jobject mainObject = (*env)->NewObject(env, mainClass, constructor); > 451: CHECK_EXCEPTION_NULL_FAIL(mainObject); > 452: jmethodID mainID = (*env)->GetMethodID(env, mainClass, "main", > "()V"); Unnecessary change. Please restore
Re: RFR: 8329862: GetApplicationHome and GetApplicationHomeFromDll enhance jli tracing [v6]
On Thu, 25 Apr 2024 13:22:01 GMT, Matthias Baesken wrote: >> We have already good JLI tracing capabilities. But GetApplicationHome and >> GetApplicationHomeFromDll lack some tracing and should be enhanced. > > Matthias Baesken has updated the pull request incrementally with one > additional commit since the last revision: > > remove /jre path check Change still good > Still looks good. We should maybe change the synopsis (title) of the bug to > something like "libjli GetApplicationHome cleanups"? Well, it does enhance tracing, so the title is not lying :) - PR Comment: https://git.openjdk.org/jdk/pull/18699#issuecomment-2078751493
Re: RFR: 8329862: GetApplicationHome and GetApplicationHomeFromDll enhance jli tracing [v5]
On Tue, 23 Apr 2024 14:31:44 GMT, Matthias Baesken wrote: >> We have already good JLI tracing capabilities. But GetApplicationHome and >> GetApplicationHomeFromDll lack some tracing and should be enhanced. > > Matthias Baesken has updated the pull request incrementally with one > additional commit since the last revision: > > adjust output LGTM - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18699#pullrequestreview-2022399268
Re: RFR: 8322043: HeapDumper should use parallel dump by default [v3]
On Wed, 17 Apr 2024 07:22:55 GMT, David Holmes wrote: >> I think it makes the code more flexible - it allows to distinguish between >> "use default value" and "I don't care" cases. > > I'm not sure it is a worthwhile distinction. Not passing an actual parameter > means "I don't care - take the default". I agree with @dholmes-ora. Let's keep things simple. - PR Review Comment: https://git.openjdk.org/jdk/pull/18748#discussion_r1568833440
Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]
On Sat, 13 Apr 2024 18:29:59 GMT, Thomas Stuefe wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains ten additional >> commits since the last revision: >> >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - jcheck fixes >> - Fix tests >> - Implement Metrics.isContainerized() >> - Some clean-up >> - Drop cgroups testing on plain Linux >> - Implement fall-back logic for non-ro controller mounts >> - Make find_ro static and local to compilation unit >> - 8261242: [Linux] OSContainer::is_containerized() returns true > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170: > >> 168: } >> 169: } >> 170: return false; > > An alternative, simpler, no need for modifying source string: > > static bool find_ro_opt(const char* o) { > return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,"); > } Please disregard my comment. Albeit longer, your version is clearer to read and more fault tolerant. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: > >> 349: // >> 350: // We collect the read only mount option in the cgroup infos so as >> to have that >> 351: // info ready when determining is_containerized(). > > Here, and in other places: a comment indicating the line format we scan would > be appreciated, possibly with argument numbers. Saves the casual code reader > from looking into proc man page. Even just pasting the example line for proc > manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) > (but with order adapted to your scanf call, they count major:minor as one) Trying to parse the `%s%*[^-]-` So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- parses everything that is not a dash, until we encounter the dash? Then we eat the dash? This is to skip the optionals? - PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567754861 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567767209
Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]
On Thu, 11 Apr 2024 12:08:02 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows >> it to figure out whether the JVM is actually running inside a container >> (`podman`, `docker`, `crio`), or with some other means that enforces >> memory/cpu limits by means of the cgroup filesystem. If neither of those >> conditions hold, the JVM runs in not containerized mode, addressing the >> issue described in the JBS tracker. For example, on my Linux system >> `is_containerized() == false" is being indicated with the following trace >> log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false >> because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new >> (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build >> 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build >> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup >> controllers typically get mounted read only into containers. Note that the >> current container tests assert that `OSContainer::is_containerized() == >> true` in various tests. Therefore, using the heuristic of "is any memory or >> cpu limit present" isn't sufficient. I had considered that in an earlier >> iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of >> claiming a containerized system being present when it's actually just a >> regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 >> (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains ten additional > commits since the last revision: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - Fix tests > - Implement Metrics.isContainerized() > - Some clean-up > - Drop cgroups testing on plain Linux > - Implement fall-back logic for non-ro controller mounts > - Make find_ro static and local to compilation unit > - 8261242: [Linux] OSContainer::is_containerized() returns true I am not enough of a container expert to judge if the basic approach is right - I trust you on this. This is just a technical code review. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170: > 168: } > 169: } > 170: return false; An alternative, simpler, no need for modifying source string: static bool find_ro_opt(const char* o) { return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,"); } src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: > 349: // > 350: // We collect the read only mount option in the cgroup infos so as > to have that > 351: // info ready when determining is_containerized(). Here, and in other places: a comment indicating the line format we scan would be appreciated, possibly with argument numbers. Saves the casual code reader from looking into proc man page. Even just pasting the example line for proc manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but with order adapted to your scanf call, they count major:minor as one) src/hotspot/os/linux/osContainer_linux.cpp line 78: > 76: const char *reason; > 77: bool any_mem_cpu_limit_present = false; > 78: bool ctrl_ro = cgroup_subsystem->is_containerized(); nit: naming? what does ctrl mean in this case? Maybe use "cgroup_is_containerized"? src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375: > 373: if (!c.isContainerized()) { > 374: ostream.println(INDENT + "System not containerized."); > 375: return; Why return here? Would this not cut the output short in the non-containerized case? And if this not intended, the not-containerized-`-XshowSettings:system` test below should test and catch this (e.g. scan for CPU set) - PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-1999328503 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1564182879 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567756663 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567774124 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567779248
Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]
On Sat, 13 Apr 2024 18:29:59 GMT, Thomas Stuefe wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains ten additional >> commits since the last revision: >> >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - jcheck fixes >> - Fix tests >> - Implement Metrics.isContainerized() >> - Some clean-up >> - Drop cgroups testing on plain Linux >> - Implement fall-back logic for non-ro controller mounts >> - Make find_ro static and local to compilation unit >> - 8261242: [Linux] OSContainer::is_containerized() returns true > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170: > >> 168: } >> 169: } >> 170: return false; > > An alternative, simpler, no need for modifying source string: > > static bool find_ro_opt(const char* o) { > return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,"); > } Please disregard my comment. Albeit longer, your version is clearer to read and more fault tolerant. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: > >> 349: // >> 350: // We collect the read only mount option in the cgroup infos so as >> to have that >> 351: // info ready when determining is_containerized(). > > Here, and in other places: a comment indicating the line format we scan would > be appreciated, possibly with argument numbers. Saves the casual code reader > from looking into proc man page. Even just pasting the example line for proc > manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) > (but with order adapted to your scanf call, they count major:minor as one) Trying to parse the `%s%*[^-]-` So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- parses everything that is not a dash, until we encounter the dash? Then we eat the dash? This is to skip the optionals? - PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567754861 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567767209
Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]
On Thu, 11 Apr 2024 12:08:02 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows >> it to figure out whether the JVM is actually running inside a container >> (`podman`, `docker`, `crio`), or with some other means that enforces >> memory/cpu limits by means of the cgroup filesystem. If neither of those >> conditions hold, the JVM runs in not containerized mode, addressing the >> issue described in the JBS tracker. For example, on my Linux system >> `is_containerized() == false" is being indicated with the following trace >> log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false >> because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new >> (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build >> 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build >> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup >> controllers typically get mounted read only into containers. Note that the >> current container tests assert that `OSContainer::is_containerized() == >> true` in various tests. Therefore, using the heuristic of "is any memory or >> cpu limit present" isn't sufficient. I had considered that in an earlier >> iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of >> claiming a containerized system being present when it's actually just a >> regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 >> (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains ten additional > commits since the last revision: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - Fix tests > - Implement Metrics.isContainerized() > - Some clean-up > - Drop cgroups testing on plain Linux > - Implement fall-back logic for non-ro controller mounts > - Make find_ro static and local to compilation unit > - 8261242: [Linux] OSContainer::is_containerized() returns true I am not enough of a container expert to judge if the basic approach is right - I trust you on this. This is just a technical code review. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170: > 168: } > 169: } > 170: return false; An alternative, simpler, no need for modifying source string: static bool find_ro_opt(const char* o) { return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,"); } src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: > 349: // > 350: // We collect the read only mount option in the cgroup infos so as > to have that > 351: // info ready when determining is_containerized(). Here, and in other places: a comment indicating the line format we scan would be appreciated, possibly with argument numbers. Saves the casual code reader from looking into proc man page. Even just pasting the example line for proc manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but with order adapted to your scanf call, they count major:minor as one) src/hotspot/os/linux/osContainer_linux.cpp line 78: > 76: const char *reason; > 77: bool any_mem_cpu_limit_present = false; > 78: bool ctrl_ro = cgroup_subsystem->is_containerized(); nit: naming? what does ctrl mean in this case? Maybe use "cgroup_is_containerized"? src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375: > 373: if (!c.isContainerized()) { > 374: ostream.println(INDENT + "System not containerized."); > 375: return; Why return here? Would this not cut the output short in the non-containerized case? And if this not intended, the not-containerized-`-XshowSettings:system` test below should test and catch this (e.g. scan for CPU set) - PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-1999328503 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1564182879 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567756663 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567774124 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567779248
Re: RFR: 8329581: Java launcher no longer prints a stack trace
On Tue, 16 Apr 2024 07:55:26 GMT, Thomas Stuefe wrote: >> Hi folks, >> >> This PR aims to fix >> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). >> >> I think the regression got introduced in >> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). >> >> In the issue linked above, >> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226) >> got removed to simplify launcher code. >> >> Previously, we used ```getMainType``` to do the appropriate main method >> invocation in ```JavaMain```. However, we currently attempt to do all types >> of main method invocations at the same time >> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623). >> >> >> Note how all of these invocations clear the exception reported with >> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390). >> >> >> Therefore, if a legitimate exception comes up during one of these >> invocations, it does not get reported. >> >> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking >> forward to your suggestions. >> >> Cheers, >> Sonia > > src/java.base/share/classes/sun/launcher/LauncherHelper.java line 912: > >> 910: private static final int MAIN_WITHOUT_ARGS = 1; >> 911: private static final int MAIN_NONSTATIC = 2; >> 912: private static int mainType = 0; > > Nit: transferring information from java to C in this way is usually done, > AFAICS, by accessing static fields directly > (GetStaticFieldID+GetStaticXXXField). There is also a helper function that > wraps this, see JNU_GetStaticFieldByName. Not sure if you have that available > in libjli though. There are many examples for both patterns. > > I also would consider just declaring two booleans here, isStatic and hasArgs, > which that would be a bit clearer to read instead of combining both into a > single flag variable, and no need to keep flag values synced across java/C. Thinking about this some more, would it not be possible to just use the mainMethod directly down in C? - PR Review Comment: https://git.openjdk.org/jdk/pull/18786#discussion_r1566948449
Re: RFR: 8329581: Java launcher no longer prints a stack trace
On Mon, 15 Apr 2024 18:25:02 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR aims to fix > [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). > > I think the regression got introduced in > [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). > > In the issue linked above, > [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226) > got removed to simplify launcher code. > > Previously, we used ```getMainType``` to do the appropriate main method > invocation in ```JavaMain```. However, we currently attempt to do all types > of main method invocations at the same time > [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623). > > > Note how all of these invocations clear the exception reported with > [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390). > > > Therefore, if a legitimate exception comes up during one of these > invocations, it does not get reported. > > I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking > forward to your suggestions. > > Cheers, > Sonia > Isn't this already fixed by #18753? > Yes, it would be good to try the changes in pull/18753 first. I still like @SoniaZaldana 's variant better. AFAIU, in the original variant, we choose the main method via MethodFinder.findMainMethod, but then down in JavaMain in C we just try to invoke all variants in succession (static,args->instance,args->static,noargs->instance,noargs). Does that no mean we ignore the decision of findMainMethod? Apart from that, it means fewer exceptions are thrown when starting the JVM if we avoid calling methods we know don't work. - PR Comment: https://git.openjdk.org/jdk/pull/18786#issuecomment-2058521084
Re: RFR: 8329581: Java launcher no longer prints a stack trace
On Mon, 15 Apr 2024 18:25:02 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR aims to fix > [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). > > I think the regression got introduced in > [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). > > In the issue linked above, > [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226) > got removed to simplify launcher code. > > Previously, we used ```getMainType``` to do the appropriate main method > invocation in ```JavaMain```. However, we currently attempt to do all types > of main method invocations at the same time > [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623). > > > Note how all of these invocations clear the exception reported with > [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390). > > > Therefore, if a legitimate exception comes up during one of these > invocations, it does not get reported. > > I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking > forward to your suggestions. > > Cheers, > Sonia Good find. src/java.base/share/classes/sun/launcher/LauncherHelper.java line 912: > 910: private static final int MAIN_WITHOUT_ARGS = 1; > 911: private static final int MAIN_NONSTATIC = 2; > 912: private static int mainType = 0; Nit: transferring information from java to C in this way is usually done, AFAICS, by accessing static fields directly (GetStaticFieldID+GetStaticXXXField). There is also a helper function that wraps this, see JNU_GetStaticFieldByName. Not sure if you have that available in libjli though. There are many examples for both patterns. I also would consider just declaring two booleans here, isStatic and hasArgs, which that would be a bit clearer to read instead of combining both into a single flag variable, and no need to keep flag values synced across java/C. - PR Review: https://git.openjdk.org/jdk/pull/18786#pullrequestreview-2002891641 PR Review Comment: https://git.openjdk.org/jdk/pull/18786#discussion_r1566896963
Re: RFR: 8330171: Lazy W^X switch implementation
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin wrote: > An alternative for preemptively switching the W^X thread mode on macOS with > an AArch64 CPU. This implementation triggers the switch in response to the > SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this > approach, it is now feasible to eliminate all WX guards and avoid potentially > costly operations. However, no significant improvement or degradation in > performance has been observed. Additionally, considering the issue with > AsyncGetCallTrace, the patched JVM has been successfully operated with > [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and > [async-profiler](https://github.com/async-profiler/async-profiler). > > Additional testing: > - [x] MacOS AArch64 server fastdebug *gtets* > - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4* > - [ ] Benchmarking > > @apangin and @parttimenerd could you please check the patch on your > scenarios?? I have one question, and I'm sorry if it has been answered before. How expensive is changing the mode? Is it just a status variable in user-space pthread lib? Or does it need a system call? In other words, how fine granular can we get without incurring too high a cost? - PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2053721713
Re: RFR: 8322043: HeapDumper should use parallel dump by default
On Fri, 12 Apr 2024 18:50:37 GMT, Alex Menkov wrote: > > I am curious: what is the memory overhead for parallel mode, and (I am not > > familiar with the logic) how many threads are involved? Is the number of > > thread bounded? > > I ask because, especially for the OnOOM handling, we may already be at a > > limit memory-wise. Starting to swap will probably be worse than running > > single-threaded. > > Good question. It think it's several MB per each additional thread (1MB > output buffer, DumperClassCacheTable - 1031 elements max, element size > depends on class field numbers, if HeapDumpGzipLevel is set, some buffers for > gzip compressors) Number of threads by default is min of > `os::initial_active_processor_count() * 3 / 8` and number of GC workers. For the OOM case, I would probably make it somehow dependent on os::free_memory() then. - PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2053517286
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 10:07:48 GMT, Claes Redestad wrote: > I guess Lilliput adds some hard or soft limit on the number of classes loaded? Yes, we are concerned with that, especially for a possible future where Lilliput is the sole default. Atm we can address about 4 million classes. There are thoughts about making this number of classes infinite, but if possible we would like to avoid that complexity. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051597166
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Fyi: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051184229
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low Stupid question maybe, this would be one LambdaForm per argument set? E.g. would two invocations with the same arguments re-use the same LambdaForm? I would like to get an understanding of the numbers of classes involved for these solutions, and whether they are bounded. Mostly for Lilliput reason. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051082454
Re: RFR: 8322043: HeapDumper should use parallel dump by default
On Fri, 12 Apr 2024 02:17:34 GMT, Alex Menkov wrote: > The fix makes VM heap dumping parallel by default. > `jcmd GC.heap_dump` and `jmap -dump` had parallel dumping by default, the fix > affects `HotSpotDiagnosticMXBean.dumpHeap()`, `-XX:+HeapDumpBeforeFullGC`, > `-XX:+HeapDumpAfterFullGC` and `-XX:+HeapDumpOnOutOfMemoryError`. > > Testing: > - manually tested different heap dump scenarios with `-Xlog:heapdump`; > - tier1,tier2,hs-tier5-svc; > - all reg.tests that use heap dump. I am curious: what is the memory overhead for parallel mode, and (I am not familiar with the logic) how many threads are involved? Is the number of thread bounded? I ask because, especially for the OnOOM handling, we may already be at a limit memory-wise. Starting to swap will probably be worse than running single-threaded. - PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2051046673
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]
On Wed, 10 Apr 2024 13:46:11 GMT, Martin Doerr wrote: >> In my humble opinion the inclusion of alloca.h was slightly cleaner, but I >> guess it doesn't matter. Out of curiosity, why do you guys prefer not >> including it? > > When only looking at AIX code, I think the inclusion of alloca.h was cleaner. > Agreed. The new code makes AIX behave like other platforms and avoids the AIX > specific part in shared code. > I could live with either version. I can live with either, too. - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1559536724
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v7]
On Wed, 10 Apr 2024 12:15:34 GMT, Joachim Kern wrote: >> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building >> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect >> clang by another name, and it uses the clang toolchain in the JDK build. >> Thus the old xlc toolchain was removed by >> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701). >> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the >> last xlc rudiment. >> This means merging the AIX specific content of >> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp >> into the corresponding gcc files on the on side and removing the >> defined(TARGET_COMPILER_xlc) blocks in the code, because the >> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX >> compiler. >> The rest of the changes are needed because of using >> utilities/compilerWarnings_gcc.hpp the compiler is much more nagging about >> ill formatted printf > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > saver solution This looks good to me now, provided Martin likes it too. Thanks for incorporating my suggestions. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18536#pullrequestreview-1991847641
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]
On Tue, 2 Apr 2024 09:19:16 GMT, Joachim Kern wrote: >> Hi Thomas, >> I would like to get totally rid of this, because as I mentioned IBM already >> modified the `stdlib.h` header not using `#define malloc vec_malloc` any >> more (and all the other vec_... defines). We have to ask the adoptium >> colleagues at IBM if they already have raised their build environment by the >> 2 SP levels needed. >> In principle we had to do the same workaround for `calloc, free,...` too, >> but they didn't show up as errors in the logging files. >> These lines where never meant to stay for long. Just to be able to compile >> until IBM fixes the issue, which is done now. > > @suchismith1993 > Hi Suchi, can you please tell me when you will raise your build environment > from AIX 7.2 TL5 SP5 to SP7? > I' am asking you, because I want to get rid of this nasty workaround. Pinging @sxa - what build environment does temurin use for AIX? - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558020493
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]
On Tue, 2 Apr 2024 16:14:12 GMT, Joachim Kern wrote: >> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building >> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect >> clang by another name, and it uses the clang toolchain in the JDK build. >> Thus the old xlc toolchain was removed by >> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701). >> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the >> last xlc rudiment. >> This means merging the AIX specific content of >> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp >> into the corresponding gcc files on the on side and removing the >> defined(TARGET_COMPILER_xlc) blocks in the code, because the >> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX >> compiler. >> The rest of the changes are needed because of using >> utilities/compilerWarnings_gcc.hpp the compiler is much more nagging about >> ill formatted printf > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > version check not needed anymore src/hotspot/os_cpu/aix_ppc/os_aix_ppc.cpp line 440: > 438: st->print("pc =" INTPTR_FORMAT " ", (unsigned > long)uc->uc_mcontext.jmp_context.iar); > 439: st->print("lr =" INTPTR_FORMAT " ", (unsigned > long)uc->uc_mcontext.jmp_context.lr); > 440: st->print("ctr=" INTPTR_FORMAT " ", (unsigned > long)uc->uc_mcontext.jmp_context.ctr); p2i src/hotspot/os_cpu/aix_ppc/os_aix_ppc.cpp line 443: > 441: st->cr(); > 442: for (int i = 0; i < 32; i++) { > 443: st->print("r%-2d=" INTPTR_FORMAT " ", i, (unsigned > long)uc->uc_mcontext.jmp_context.gpr[i]); p2i - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558017408 PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558017827
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]
On Tue, 2 Apr 2024 10:26:42 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/os_aix.cpp line 314: >> >>> 312: ErrnoPreserver ep; >>> 313: log_trace(os, map)("disclaim failed: " RANGEFMT " errno=(%s)", >>> 314: RANGEFMTARGS(p, (long)maxDisclaimSize), >> >> Wait, why are these casts needed? maxDisclaimSize is size_t, RANGEFMT uses >> SIZE_FORMAT. That should work without cast. > > Hi Thomas, `maxDisclaimSize` is of type `unsigned int`; therefore I get the > following warning: > > os/aix/os_aix.cpp:314:42: error: format specifies type 'unsigned long' but > the argument has type 'unsigned int' [-Werror,-Wformat] > RANGEFMTARGS(p, maxDisclaimSize), > ^~~ > > Should I keep the casts, or change the type of `maxDisclaimSize, > numFullDisclaimsNeeded, lastDisclaimSize` to `const unsigned long`? I would change them to size_t. Thanks for doing this. - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558012122
Re: RFR: 8318026: jcmd should provide access to detailed JVM object information [v9]
On Thu, 28 Mar 2024 17:15:26 GMT, Kevin Walls wrote: >>> In my opinion, UnlockDiagnosticVMOptions is not a good enough safeguard >>> since it guards a whole swathe of switches that we may instruct the >>> customer to enable. Once enabled, my experience is that >>> UnlockDiagnosticVMOptions often lingers around. It is not unusual for >>> customer scenarios to have set +UnlockDiagnosticVMOptions because of some >>> years ago support cases. >> >> I think we also need to consider the flip side of this argument. Is this >> something that some customers might want to always enable, but don't want to >> always have UnlockDiagnosticVMOptions enabled. A new command line flag would >> be needed in that case. >> >> Also, isn't UnlockDiagnosticVMOptions meant for enabling the use of >> diagnostic command line flags? Do we have examples of it enabling a hotspot >> feature that does not also require setting a diagnostic command line flag? > >> I think we also need to consider the flip side of this argument. Is this >> something that some customers might want to always enable, but don't want to >> always have UnlockDiagnosticVMOptions enabled. A new command line flag would >> be needed in that case. >> >> Also, isn't UnlockDiagnosticVMOptions meant for enabling the use of >> diagnostic command line flags? Do we have examples of it enabling a hotspot >> feature that does not also require setting a diagnostic command line flag? > > Thanks Chris - that is correct now I check the wording, > UnlockDiagnosticVMOptions: "Enable normal processing of flags relating to > field diagnostics" > > Yes it makes flags which are DIAGNOSTIC available at the command-line. We > have UnlockExperimentalVMOptions also. > > My original reason for the guard, is that the risk of fooling the "good oop" > check is that we could possibly crash (so that's answering your other > question). Inspecting an arbitrary pointer that looks enough like a Java > heap object to fool the test, means it might try and resolve bad addresses > for Klass pointer or field data. > > I also have a stress test which examines every address in a Java heap to test > for crashes. That's obviously long-running as it does a jcmd for every > iteration. But I can't currently get a crash, trying G1, ZGC, > ZGCGenerational. > > Having a global guard to prevent usage of VM.inspect may be needed less than > I thought, but will wait on the other thread before saying that is really the > case. We could have a new -XX flag to guard it (whether specific to this > command or a more general UnlockDiagnosticFeatures. I would definitely stick > to this being in all builds, as we frequently "debug" non-debug builds. > > Also, for your comment above about the new version of dbg_is_good_oop: this > was added because although there are not many callers of it, I did not want > to upset them. During my earlier testing the detailed version of this method > did upset something, although I cannot reproduce that today. > > I will rerun some tests on these items... > @kevinjwalls Thanks for clarifying the relevant behaviours and code paths as > well as identifying a few places where documentation might help devs avoid > tripping over any issues. Much appreciated. @kevinjwalls I am fine with this as well. Thank you for answering our questions. - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-2042743210
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc
On Fri, 29 Mar 2024 07:56:10 GMT, Thomas Stuefe wrote: >> src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 50: >> >>> 48: #undef malloc >>> 49: extern void *malloc(size_t) asm("vec_malloc"); >>> 50: #endif >> >> This `#if` is not needed if we are building on AIX 7.2 TL5 SP7 or higher. >> This is the case in our build environment, but I think IBM is still building >> with SP5, which would run into build errors if I remove this `#if` now. > > While looking at this, I noticed that my question in > https://github.com/openjdk/jdk/pull/14146#discussion_r1207078176 and > followups had never been answered. Do you know the answers now? > > Quoting myself: > >> So, we do this only for malloc? Not for calloc, posix_memalign, realloc etc? >> What about free? >> Removing that define and hard-coding it here assumes ... pointers it returns >> work with the unchanged free() and realloc() the system provides, and will >> always do so. >> I am basically worried that undefining malloc, even if it seems harmless >> now, exposes us to difficult-to-investigate problems down the road, since it >> depends on how the libc devs will reform those macros in the future. Other than that, and kind of depending on your answer: How important is it that we catch every use of the original malloc? Can be safely mix the original malloc with vec_malloc if logging is not involved? I am asking, because from that it depends whether this hunk needs to appear right behind `#include ` or whether we can move it into the middle of the file together with the other AIX stuff. Because, if we move it into the middle of the file, we may miss any uses of malloc that may happen in system headers (would be unusual for that to happen but with IBM one never knows). - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544201412
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc
On Thu, 28 Mar 2024 16:57:00 GMT, Joachim Kern wrote: >> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building >> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect >> clang by another name, and it uses the clang toolchain in the JDK build. >> Thus the old xlc toolchain was removed by >> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701). >> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the >> last xlc rudiment. >> This means merging the AIX specific content of >> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp >> into the corresponding gcc files on the on side and removing the >> defined(TARGET_COMPILER_xlc) blocks in the code, because the >> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX >> compiler. >> The rest of the changes are needed because of using >> utilities/compilerWarnings_xlc.hpp the compiler is much more nagging about >> ill formatted printf > > src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 50: > >> 48: #undef malloc >> 49: extern void *malloc(size_t) asm("vec_malloc"); >> 50: #endif > > This `#if` is not needed if we are building on AIX 7.2 TL5 SP7 or higher. > This is the case in our build environment, but I think IBM is still building > with SP5, which would run into build errors if I remove this `#if` now. While looking at this, I noticed that my question in https://github.com/openjdk/jdk/pull/14146#discussion_r1207078176 and followups had never been answered. Do you know the answers now? Quoting myself: > So, we do this only for malloc? Not for calloc, posix_memalign, realloc etc? > What about free? > Removing that define and hard-coding it here assumes ... pointers it returns > work with the unchanged free() and realloc() the system provides, and will > always do so. > I am basically worried that undefining malloc, even if it seems harmless now, > exposes us to difficult-to-investigate problems down the road, since it > depends on how the libc devs will reform those macros in the future. - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544199335
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc
On Fri, 29 Mar 2024 07:18:47 GMT, Thomas Stuefe wrote: >> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building >> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect >> clang by another name, and it uses the clang toolchain in the JDK build. >> Thus the old xlc toolchain was removed by >> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701). >> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the >> last xlc rudiment. >> This means merging the AIX specific content of >> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp >> into the corresponding gcc files on the on side and removing the >> defined(TARGET_COMPILER_xlc) blocks in the code, because the >> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX >> compiler. >> The rest of the changes are needed because of using >> utilities/compilerWarnings_xlc.hpp the compiler is much more nagging about >> ill formatted printf > > src/hotspot/os/aix/loadlib_aix.cpp line 120: > >> 118: (lm->is_in_vm ? '*' : ' '), >> 119: (uintptr_t)lm->text, (uintptr_t)lm->text + lm->text_len, >> 120: (uintptr_t)lm->data, (uintptr_t)lm->data + lm->data_len, > > Please don't cast, use `p2i()`. Check copyrights in this file and all others. Adapt SAP and Oracle copyrights. > src/hotspot/os/aix/os_aix.cpp line 651: > >> 649: lt.print("Thread is alive (tid: " UINTX_FORMAT ", kernel thread id: >> " UINTX_FORMAT >> 650: ", stack [" PTR_FORMAT " - " PTR_FORMAT " (" SIZE_FORMAT >> "k using %luk pages)).", >> 651: os::current_thread_id(), (uintx) kernel_thread_id, >> (uintptr_t)low_address, (uintptr_t)high_address, > > Use p2i, not cast Here, and in other places too where you cast a pointer to fit into PTR_FORMAT or INTPTR_FORMAT - PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544172412 PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544181011
Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc
On Thu, 28 Mar 2024 16:50:20 GMT, Joachim Kern wrote: > As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building > the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect > clang by another name, and it uses the clang toolchain in the JDK build. Thus > the old xlc toolchain was removed by > [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701). > Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the > last xlc rudiment. > This means merging the AIX specific content of > utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp > into the corresponding gcc files on the on side and removing the > defined(TARGET_COMPILER_xlc) blocks in the code, because the > defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX > compiler. > The rest of the changes are needed because of using > utilities/compilerWarnings_xlc.hpp the compiler is much more nagging about > ill formatted printf Hi @JoKern65 , this is a nice simplification! There are many casts that should be unnecessary (casting size_t to fit into SIZE_FORMAT) or that should use `p2i` (all those casts to fit pointers into PTR_FORMAT or INTPTR_FORMAT). I did not mark all of them. We also have a new macro for printing memory ranges, RANGEFMT and RANGEFMTARGS. Maybe some call sites could that instead and make the code shorter and better to read? But I leave that up to you. Cheers, and happy Easter! src/hotspot/os/aix/loadlib_aix.cpp line 120: > 118: (lm->is_in_vm ? '*' : ' '), > 119: (uintptr_t)lm->text, (uintptr_t)lm->text + lm->text_len, > 120: (uintptr_t)lm->data, (uintptr_t)lm->data + lm->data_len, Please don't cast, use `p2i()`. src/hotspot/os/aix/os_aix.cpp line 314: > 312: ErrnoPreserver ep; > 313: log_trace(os, map)("disclaim failed: " RANGEFMT " errno=(%s)", > 314: RANGEFMTARGS(p, (long)maxDisclaimSize), Wait, why are these casts needed? maxDisclaimSize is size_t, RANGEFMT uses SIZE_FORMAT. That should work without cast. src/hotspot/os/aix/os_aix.cpp line 651: > 649: lt.print("Thread is alive (tid: " UINTX_FORMAT ", kernel thread id: > " UINTX_FORMAT > 650: ", stack [" PTR_FORMAT " - " PTR_FORMAT " (" SIZE_FORMAT "k > using %luk pages)).", > 651: os::current_thread_id(), (uintx) kernel_thread_id, > (uintptr_t)low_address, (uintptr_t)high_address, Use p2i, not cast src/hotspot/os/aix/os_aix.cpp line 1212: > 1210: st->print_cr("physical free : " SIZE_FORMAT, (unsigned > long)mi.real_free); > 1211: st->print_cr("swap total : " SIZE_FORMAT, (unsigned > long)mi.pgsp_total); > 1212: st->print_cr("swap free : " SIZE_FORMAT, (unsigned > long)mi.pgsp_free); A better way to do this would be to change AIX::meminfo to use size_t. We should have done this when introducing that API. src/hotspot/os/aix/os_aix.cpp line 1399: > 1397: os->print("[" PTR_FORMAT " - " PTR_FORMAT "] (" UINTX_FORMAT > 1398: " bytes, %ld %s pages), %s", > 1399: (uintptr_t)addr, (uintptr_t)addr + size - 1, size, size / > pagesize, describe_pagesize(pagesize), p2i src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 62: > 60: #include > 61: > 62: #if defined(LINUX) || defined(_ALLBSD_SOURCE) || defined(_AIX) What else is left? Could we just remove this line altogether now? src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 83: > 81: #error "xlc version not supported, macro __open_xl_version__ not found" > 82: #endif > 83: #endif // AIX Can probably be shortened like this: Suggestion: #ifdef _AIX #if !defined(__open_xl_version__) || (__open_xl_version__ < 17) #error "this xlc version is not supported" #endif #endif // AIX src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 103: > 101: #endif > 102: > 103: #if !defined(LINUX) && !defined(_ALLBSD_SOURCE) && !defined(_AIX) I believe this whole section can be removed now. At least I have no idea who this is for. What gcc versions does OpenJDK still support, then, beside these platforms. Also, any gcc platform not on linux or bsd would have hit the #error below at line 132. src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 128: > 126: #if defined(__APPLE__) > 127: inline int g_isnan(double f) { return isnan(f); } > 128: #elif defined(LINUX) || defined(_ALLBSD_SOURCE) || defined(_AIX) I think this can be just #else src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 131: > 129: inline int g_isnan(float f) { return isnan(f); } > 130: inline int g_isnan(double f) { return isnan(f); } > 131: #else and this can be removed - Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18536#pullrequestreview-1967997963 PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544171741 PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544174303 PR Review
Re: jmx-dev RFR: 8327505: Test com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.java fails
On Tue, 19 Mar 2024 16:23:27 GMT, Kevin Walls wrote: > Client.java has a fixed 30-second timeout on the CountDownLatch to wait for > 10 notifications. > > If it fails, you can't tell if CountDownLatch.await threw, or returned false > and the app threw InterruptedException, due to the way Client.java handles > these. > > Seems most likely the 30 second wait expired, as we are dealing with -Xcomp > failures in a debug build. Passing runs can take a few seconds, but can be > 25 seconds. > > Increasing the timeout and tidying up the handling so we can see the specific > reason in future. Looks good! - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18381#pullrequestreview-1965321788
Re: RFR: 8327505: Test com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.java fails
On Tue, 19 Mar 2024 16:23:27 GMT, Kevin Walls wrote: > Client.java has a fixed 30-second timeout on the CountDownLatch to wait for > 10 notifications. > > If it fails, you can't tell if CountDownLatch.await threw, or returned false > and the app threw InterruptedException, due to the way Client.java handles > these. > > Seems most likely the 30 second wait expired, as we are dealing with -Xcomp > failures in a debug build. Passing runs can take a few seconds, but can be > 25 seconds. > > Increasing the timeout and tidying up the handling so we can see the specific > reason in future. Looks good! - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18381#pullrequestreview-1965321788
Re: RFR: JDK-8327769: jcmd GC.heap_dump without options should write to location given by -XX:HeapDumpPath, if set [v10]
On Wed, 27 Mar 2024 13:44:42 GMT, Matthias Baesken wrote: >> Currently jcmd command GC.heap_dump only works with an additionally provided >> file name. >> Syntax : GC.heap_dump [options] >> >> In case the JVM has the XX - flag HeapDumpPath set, we should support an >> additional mode where the is optional. >> In case the filename is NOT set, we take the HeapDumpPath (file or >> directory); >> >> new syntax : >> GC.heap_dump [options] .. has precedence over second option >> GC.heap_dump [options] …in case -XX: HeapDumpPath=p is set >> >> This would be a simplification e.g. for support cases where a filename or >> directory is set at JVM startup with -XX: HeapDumpPath=p and writing to the >> path is intended/recommended for usage also in the jcmd case. > > Matthias Baesken has updated the pull request incrementally with one > additional commit since the last revision: > > adjust java.1 man page To weight in on the side of this patch, in most customer scenarios I have seen, there is just one location for heap dumps. Using the same location for OOM and for manual heap dumps seems logical to me. @dholmes-ora > Notwithstanding that there may be cases where this change would be > convenient, I really don't like this coupling between the jcmd behaviour and > a -XX flag that is intended for something else. It doesn't completely mesh > with the jcmd usage and other options, and the documentation story is quite > complicated. Wouldn't this just be a case of changing a flag description? As luck has it, the flag already has a generic name that is not tied to OOMs. - PR Comment: https://git.openjdk.org/jdk/pull/18190#issuecomment-2024535044
Re: RFR: 8318026: jcmd should provide access to detailed JVM object information [v9]
On Tue, 26 Mar 2024 21:55:38 GMT, Kevin Walls wrote: >> Introduce the jcmd "VM.inspect" to implement access to detailed JVM object >> information. >> >> Not recommended for live production use. Requires UnlockDiagnosticVMOptions >> and not included in jcmd help output, to remind us this is not a >> general-purpose customer-facing tool. > > Kevin Walls has updated the pull request incrementally with one additional > commit since the last revision: > > Undo include I am still feeling uneasy about this new command. I can see its potential usefulness, but as stated before would prefer it being limited to debug-only JVMs. - jcmd can be exposed remotely, e.g., via jmx. I am not sure whether other solutions exist, but exposing jcmd beyond the immediate box the JVM runs on is something many folks want, and that is technically easy to do. So, customer-local solutions may exist for that, and the reach of jcmd may be larger than we think. - the underlying functionality for print_location was written with debugging and error analysis in mind. We keep adding functionality there. I don't think developers adding to that function have in mind that this functionality may be exposed, possibly remotely. In my opinion, UnlockDiagnosticVMOptions is not a good enough safeguard since it guards a whole swathe of switches that we may instruct the customer to enable. Once enabled, my experience is that UnlockDiagnosticVMOptions often lingers around. It is not unusual for customer scenarios to have set +UnlockDiagnosticVMOptions because of some years ago support cases. If others feel this command is safe enough, I'll shut up and be quiet, since I cannot think up a concrete attack scenario. - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-2022310874
Re: RFR: 8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync [v3]
On Mon, 18 Mar 2024 13:14:41 GMT, Richard Reingruber wrote: >> This pr changes `JfrJvmtiAgent::retransform_classes()` and >> `jfr_set_enabled` to switch to `WXWrite` before transitioning to the vm. >> >> Testing: >> make test TEST=jdk/jfr/event/runtime/TestClassLoadEvent.java >> TEST_VM_OPTS=-XX:+AssertWXAtThreadSync >> make test TEST=compiler/intrinsics/klass/CastNullCheckDroppingsTest.java >> TEST_VM_OPTS=-XX:+AssertWXAtThreadSync >> >> More tests are pending. > > Richard Reingruber has updated the pull request with a new target base due to > a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains three additional > commits since the last revision: > > - Merge branch 'master' into 8327990_jfr_enters_vm_without_wxwrite > - Set WXWrite at more locations > - Switch to WXWrite before entering the vm I think this patch makes sense, and does not compete with a long-term solution. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18238#pullrequestreview-1947069985
Re: RFR: 8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync [v3]
On Tue, 19 Mar 2024 12:17:22 GMT, David Holmes wrote: >> Instead, could we tag code that needs one or the other, keep track of the >> current WX state in thread-local memory, and flip WX only when we know we >> need to? The first part we already do. I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation. (Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow). > > That's very odd. The example there doesn't even involve MAP_JIT memory, so > > what does it have to do with WX? > > @theRealAph that is the mystery we hope will be resolved once we know the > nature of the underlying OS bug. Somehow switching to exec mode > fixes/works-around the issue. I can imagine a missing conditional to check if > the region is MAP_JIT. > > > Changing WX at VM state transitions is a form of temporal coupling, a > > classic design smell that has caused problems for decades. > > The original introducers of WXEnable made the decision that the VM should be > in WRITE mode unless it needs EXEC. That is the state we are presently trying > to achieve with this change. If that original design choice is wrong then ... > > > Instead, could we tag code that needs one or the other, keep track of the > > current WX state in thread-local memory, and flip WX only when we know we > > need to? > > And I've asked about this every time a missing WXEnable has had to be added. > We seem to be generically able to describe what kind of code needs which > mode, but we seem to struggle to pin it down. Though that is what > https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing. > > > That'd (by definition) reduce the number of transitions to the minimum if > > we were through. > > Not necessarily. It may well remove some transitions from paths that don't > need it, but if you move the state change too low down the call chain you > could end up transitioning much more often in code that does need it e.g. if > a transitioning method is called in a loop. Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway. - PR Comment: https://git.openjdk.org/jdk/pull/18238#issuecomment-2007469410
Re: RFR: 8327675: jspawnhelper should be built on all unix platforms
On Fri, 8 Mar 2024 10:17:08 GMT, Magnus Ihse Bursie wrote: > We should match the building of jspawnhelper in > make/modules/java.base/Launcher.gmk with the usage for all unix platforms in > src/java.base/unix/classes/java/lang/ProcessImpl.java. > > Granted, the list of OSes in the makefile amounts to the current list of all > Unix OSes in the JDK, but there is no need to have this logical disparity. > > This was inspired by the discovery in > https://github.com/openjdk/jdk/pull/18112#discussion_r1517455696. +1 - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18165#pullrequestreview-1924593320
Re: RFR: 8327675: jspawnhelper should be built on all unix platforms
On Fri, 8 Mar 2024 10:17:08 GMT, Magnus Ihse Bursie wrote: > We should match the building of jspawnhelper in > make/modules/java.base/Launcher.gmk with the usage for all unix platforms in > src/java.base/unix/classes/java/lang/ProcessImpl.java. > > Granted, the list of OSes in the makefile amounts to the current list of all > Unix OSes in the JDK, but there is no need to have this logical disparity. > > This was inspired by the discovery in > https://github.com/openjdk/jdk/pull/18112#discussion_r1517455696. +1 - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18165#pullrequestreview-1924593320
Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v7]
On Tue, 5 Mar 2024 11:31:13 GMT, Kevin Walls wrote: >> Introduce the jcmd "VM.debug" to implement access to a useful set of the >> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...". >> >> Not recommended for live production use. Calling these "debug" utilities, >> and not including them in the jcmd help output, is to remind us they are not >> general customer-facing tools. > > Kevin Walls has updated the pull request incrementally with one additional > commit since the last revision: > > Test update Hi Kevin, > > (If you find existing jcmds a mess, feel free to suggest changes as > appropriate.) you touch on a problem here, and why I think adding commands should be done more carefully (and I am guilty of adding commands too). Once these commands are rolled out, they find themselves in blogs, knowledge bases, scripts that are reused for different JDK releases, and so on. It is difficult to change them post-release. That is why good names are important. Cheers, Thomas - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1980807020
Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v4]
On Mon, 4 Mar 2024 22:02:53 GMT, Kevin Walls wrote: >> Kevin Walls has updated the pull request incrementally with three additional >> commits since the last revision: >> >> - Usage correction >> - Help to clarify this is VM inspection. Comment to relate source to >> debug.cpp. >> - jcheck trailing whitespace > > Hi Thomas @tstuefe - > > Security concerns certainly needed some thought. We have remote access to > DiagnosticCommands over JMX. I don't see a particular need for that for this > command. I think it was keeping the DCmd flagged as hidden that hides it > from the list of dcmds available in that way. So we have the attach api > controls using local user identity, as we do for everything else. > > Yes, "events" is not that useful, I should remove it. I was taking the > useful parts of debug.cpp, and although the events were in VM.info, I had > missed that since your JDK-8224600, VM.events is its own command now! > > Thread.print with stacks, vs "VM.debug threads" which is just a thread list, > so you have e.g. JavaThread*, name, native id, stack range. > > "find": Usage at the moment, would likely be MessageBoxOnError, and a native > debugger to get the native stacktrace and parameters that include an object > reference you care about. It might be other jcmds if e.g. events capture a > relevant problematic pointer, but likely it involves a native debugger. > Using jcmd behaves better, showing the output where you run jcmd, not in the > VM's current output as the debug.cpp utils do. > Native debugger syntax to call abritrary functions can be awkward, > particularly on Windows, so the jcmd should be a better experience. > > This might be more compelling in post-mortem usage. Am working on that. > i.e. jcmd on a core file. But I am saying it offers some value today. > > The class/method finders, I've heard some enthusiasm for their inclusion. We > don't want to encourage overlap but yes we do have some overlapping jcmds. > > > This is an umbrella but I don't think it's vague. VMDebugDcmd is for > inspecting VM state. It's inspired by debug.cpp utilities, does not need to > implement all of them, but does aim to make them more accessible (I will > assert that they are not well known, which is hard to prove.) > > Do we have a problem with jcmd feature creep? If anything has crept too far > it can be addressed. Looks like the DCmd classes have approximately doubled > since jdk8u but this looks like growth in a good way. > > Thanks > Kevin Hi Kevin, @kevinjwalls , thank you for your explanations. Please find answers inline. > Security concerns certainly needed some thought. We have remote access to > DiagnosticCommands over JMX. I don't see a particular need for that for this > command. I think it was keeping the DCmd flagged as hidden that hides it from > the list of dcmds available in that way. So we have the attach api controls > using local user identity, as we do for everything else. > > Yes, "events" is not that useful, I should remove it. I was taking the useful > parts of debug.cpp, and although the events were in VM.info, I had missed > that since your JDK-8224600, VM.events is its own command now! > > Thread.print with stacks, vs "VM.debug threads" which is just a thread list, > so you have e.g. JavaThread*, name, native id, stack range. > > "find": Usage at the moment, would likely be MessageBoxOnError, and a native > debugger to get the native stacktrace and parameters that include an object > reference you care about. It might be other jcmds if e.g. events capture a > relevant problematic pointer, but likely it involves a native debugger. Using > jcmd behaves better, showing the output where you run jcmd, not in the VM's > current output as the debug.cpp utils do. Native debugger syntax to call > abritrary functions can be awkward, particularly on Windows, so the jcmd > should be a better experience. I remain sceptic here, because in my experience, once you start poking at the JVM innards at this level, I guess you will be quickly at the limit of what this command can do for you and need to attach a debugger anyway. Could this be an own command, e,g, `VM.inspect`, and possibly limited to debug VMs? Do we really need this feature in production? > > This might be more compelling in post-mortem usage. Am working on that. i.e. > jcmd on a core file. But I am saying it offers some value today. Don't we have jhsdb for that? > > The class/method finders, I've heard some enthusiasm for their inclusion. We > don't want to encourage overlap but yes we do have some overlapping jcmds. Yeah, I can see this being useful. > > This is an umbrella but I don't think it's vague. VMDebugDcmd is for > inspecting VM state. It's inspired by debug.cpp utilities, does not need to > implement all of them, but does aim to make them more accessible (I will > assert that they are not well known, which is hard to prove.) > > Do we have
Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v4]
On Mon, 4 Mar 2024 15:10:12 GMT, Kevin Walls wrote: >> Introduce the jcmd "VM.debug" to implement access to a useful set of the >> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...". >> >> Not recommended for live production use. Calling these "debug" utilities, >> and not including them in the jcmd help output, is to remind us they are not >> general customer-facing tools. > > Kevin Walls has updated the pull request incrementally with three additional > commits since the last revision: > > - Usage correction > - Help to clarify this is VM inspection. Comment to relate source to > debug.cpp. > - jcheck trailing whitespace Of the sub commands given in the CSR: eventsHotSpot JVM Event log threads Thread list find ADDRESS Describe an address given by the argument findclass CLASS_PATTERN FLAGS findmethod CLASS_PATTERN METHOD_PATTERN FLAGS 1) `events` is redundant (we have VM.events, and events are printed as part of VM.info) 2) `threads` is handled by Thread.info 3) I am still puzzled about `find`. This is to ask a running VM about an arbitrary address. But how do you get such an address? Usually only via debugging. Then I am already attached with a system debugger. Under which circumstances would this be needed? Can you give a usage example? 4) and 5) shared some overlap with VM.classes, VM.class_hierarchy, VM.metaspace and various Compiler.xx commands. I am mostly worried about adding such a low-level debug command to jcmd. Especially in such a vague umbrella form. Experiences show that these commands get extended easily, often without CSR. So its an open door for feature creep. I mostly worry about unforeseen security consequences, especially if jcmd reach is extended across machine boundaries. Weighted against that, the benefits seem a bit slim to me. Again, real-world use cases would help. - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1976941359
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v26]
On Mon, 26 Feb 2024 11:24:13 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > >Address review comments Okay - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1902581556
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v25]
On Tue, 20 Feb 2024 09:27:15 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > >remove space Changes requested by stuefe (Reviewer). src/hotspot/os/aix/os_aix.cpp line 1167: > 1165: Load "libfilename.so" first, then load libfilename.a, on failure. > 1166: In OpenJ9, the libary with .so extension is loaded first and then .a > extension, on failure. > 1167: */ Wrong block comment, but the comment itself is also off now. Suggestion: Suggestion: // Load library named // If filename matches .so, and loading fails, repeat with .a. src/hotspot/os/aix/os_aix.cpp line 1173: > 1171: char* const pointer_to_dot = strrchr(file_path, '.'); > 1172: char const *old_extension = ".so"; > 1173: char const *new_extension = ".a"; Suggestion: char* const file_path = strdup(filename); char* const pointer_to_dot = strrchr(file_path, '.'); const char old_extension[] = ".so"; const char new_extension[] = ".a"; STATIC_ASSERT(sizeof(old_exception) >= sizeof(new_exception)); and remove runtime-assert below src/hotspot/os/aix/os_aix.cpp line 1178: > 1176: FREE_C_HEAP_ARRAY(char, file_path); > 1177: return result; > 1178: } I would remove this whole section since it's a functional change not covered by the issue description and unnecessary for your fix. It may also be wrong: loading shared objects without extension may be perfectly valid. The extension is just a convention. src/hotspot/os/aix/os_aix.cpp line 1183: > 1181: // If the load fails,we try to reload by changing the extension to .a > for .so files only. > 1182: // Shared object in .so format dont have braces, hence they get > removed for archives with members. > 1183: if (result == nullptr) { Suggestion: if (result == nullptr && pointer_to_dot != nullptr && strcmp(pointer_to_dot, old_extension) == 0) { src/hotspot/os/aix/os_aix.cpp line 1184: > 1182: // Shared object in .so format dont have braces, hence they get > removed for archives with members. > 1183: if (result == nullptr) { > 1184: assert(strlen(new_extension) < strlen(old_extension), "New > extension length must be less than existing one"); Can be removed if you do the STATIC_ASSERT suggested above. src/hotspot/os/aix/os_aix.cpp line 1186: > 1184: assert(strlen(new_extension) < strlen(old_extension), "New > extension length must be less than existing one"); > 1185: if (strcmp(pointer_to_dot, old_extension) == 0) { > 1186: sprintf(pointer_to_dot, "%s", new_extension); Use os::snprintf, and limit output buffer by size of old extension. - PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1899612274 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757595 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501754431 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501754774 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757736 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757772 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757844
Re: RFR: JDK-8252802: java launcher should set MALLOCOPTIONS and LDR_CNTRL in order to use 64KB pages on AIX
On Mon, 19 Feb 2024 05:52:22 GMT, Varada M wrote: > DeCapo Benchmark Results (3 runs) : > > Before : > = DaCapo 9.12 h2 PASSED in 281402 msec = > = DaCapo 9.12 h2 PASSED in 269818 msec = > = DaCapo 9.12 h2 PASSED in 279279 msec = > > After: > = DaCapo 9.12 h2 PASSED in 279192 msec = > = DaCapo 9.12 h2 PASSED in 269769 msec = > = DaCapo 9.12 h2 PASSED in 271577 msec = > > Environmental variables LDR_CNTRL and MALLOCOPTIONS has caused > test/jdk/java/lang/ProcessBuilder/Basic.java failure. > Additional environmental variables has to removed from > removeAixExpectedVars(). > > JBS Issue : [JDK-8252802](https://bugs.openjdk.org/browse/JDK-8252802) I don't think this works as intended. IIRC, both variables must be set *before process invocation*. Setting them inside the launcher will only affect child processes. The dacapo results posted seem to support this, they seem random-ish to me. Note that we already use 64KB pages for all large memory regions (everything that goes through `os::reserve_memory`). So, while the value of LDRCNTRL is not nil, it is very diminished. Mostly just the Heap is affected. But I would not change defaults for options like these anyway, especially not hard-coded. These knobs have far-ranging implications. If we want that, we need to investigate carefully. (for example: would using 64KB pages increase average heap size? Possibly, depending on the implementation. But AIX still has this problem where the heap can only live in the break and can bump against low-hanging regions. So, AIX is especially vulnerable against any changes that increase average heap usage) - PR Comment: https://git.openjdk.org/jdk/pull/17906#issuecomment-1953643518
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v23]
On Mon, 19 Feb 2024 13:25:28 GMT, Joachim Kern wrote: >> Thanks for the detailed explanation @JoKern65 . Do then in this errno check >> may not be necessary ? or can we still set errno and access it some way ? > > In this special case here I would not use errno, but the string returned in > ebuf, in case the result is nullptr. Argh, I keep forgetting `dlopen` does not set errno (at least its not guaranteed by the standard). Ok, in this case, I would not analyze the string either because it may be locale-dependent. Never mind then; I'd thought this would be easy. One could probably use load() instead of dlopen() on AIX, but that change would require more investigations to see if load() and dlopen() are equivalent. Okay, in that case just skip the errno test and call the second dlopen directly. Lets hope for the best then. - PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1494586044
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]
On Fri, 16 Feb 2024 12:25:39 GMT, Suchismith Roy wrote: > > > > Hi, > > > > some remarks: > > > > > > > > * there is no need for a copy for the first call to dll_load_library. > > > > Just hand in the string 1:1. > > > > * I would only do the *.a fallback loading if the error indicates that > > > > the *.so file had not been there. So, only if EACCESS or ENOENT; in all > > > > other cases I would not do the fallback. E.g. if the *.so file cannot > > > > be loaded due to a header mismatch. See > > > > https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines > > > > * Please use os::strdup. > > > > * Please assert that the replacement string is smaller than the > > > > original string (which it should be, *.so is longer than *.a, but this > > > > is insurance against anyone changing the code in the future) > > > > > > > > Thank you, Thomas > > > > > > > > > Sure working on them. May i know why we are using the load routine in the > > > 2nd point ? . Currently we do a *.a fallback only when dlopen fails. Does > > > load function save some steps here ? > > > > > > I don't understand the question, sorry. > > What I mean is when the first dlopen fails AND its error indicates the > > shared library had been missing, only then attempt the *.a fallback. > > I see. I think i was referred to the init routine in the link. Ah, okay. No, IBMs dlopen page for AIX refers to this page for the list of possible error codes; I assume dlopen just uses these functions under the hood. > So you mean based on the errors inside dll_load_library, we set the errno to > appropriate Error and then check for that before using the fallback , is that > correct ? No. I mean distinguish whether dlopen fails because it could not find the file (a valid reason to retry with *.a) or failed for another reason (which would not be a valid reason to retry with *.a). - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-194840
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]
On Thu, 15 Feb 2024 17:50:23 GMT, Suchismith Roy wrote: > > Hi, > > some remarks: > > > > * there is no need for a copy for the first call to dll_load_library. Just > > hand in the string 1:1. > > * I would only do the *.a fallback loading if the error indicates that the > > *.so file had not been there. So, only if EACCESS or ENOENT; in all other > > cases I would not do the fallback. E.g. if the *.so file cannot be loaded > > due to a header mismatch. See > > https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines > > * Please use os::strdup. > > * Please assert that the replacement string is smaller than the original > > string (which it should be, *.so is longer than *.a, but this is insurance > > against anyone changing the code in the future) > > > > Thank you, Thomas > > Sure working on them. May i know why we are using the load routine in the 2nd > point ? . Currently we do a *.a fallback only when dlopen fails. Does load > function save some steps here ? I don't understand the question, sorry. What I mean is when the first dlopen fails AND its error indicates the shared library had been missing, only then attempt the *.a fallback. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1947780121
Re: RFR: 8325878: Require minimum Clang version 13
On Fri, 16 Feb 2024 04:46:24 GMT, Kim Barrett wrote: > > Unfortunately this will break my workflow on Linux - I use clang to build > > on Ubuntu 20.04, which is not that old, but it ships with clang 12. This is > > not a deal breaker, just annoying. > > That's unfortunate, but I think the [[noreturn]] issue is very important. > Until we have that fix, we can't rely on that attribute to silence certain > warnings. This requires us to continue to insert dead returns or apply other > workarounds. And forgetting to do so (because someone makes a change that > works fine with later versions or on other platforms) will lead to > build-breakage JBS issues/PRs just to continue to support older versions of > clang. We've already had several of those. Thank you for explaining the motivation. @forax also mentioned that our workaround produces false warnings in IDEs (just checked in CDT). I'm fine with it - I will find another solution for my Linux box. - PR Comment: https://git.openjdk.org/jdk/pull/17862#issuecomment-1947776707
Re: RFR: 8314488: Compile the JDK as C++17 [v6]
On Mon, 5 Feb 2024 09:44:02 GMT, Magnus Ihse Bursie wrote: > > Sure, you can always install a newer GCC than the system one, but it's > > another thing that makes it harder for people to build OpenJDK. Having said > > that, C++17 is nice to have. > > @theRealAph I am still just hearing hand-waving "perhaps someone somewhere > will have a somewhat harder time building the JDK". Yes, perhaps that is so. > But that is very uncertain, and I have still not heard a single concrete > example of where this would apply. In contrast, going to gcc 10 will clearly > bring a benefit to all other platforms, since it allows us to synchronize the > code base at C++17. > > In light of this, I think we need to hear some really compelling evidence of > problems that would ensue if we raise the minimum to gcc 10. If nobody can > produce such evidence, then to me it is a sign that this fear is not > well-grounded, and we should proceed with this PR. @magicus You put the onus of proving that problems could ensue strictly to the objectors. That is a bit one-sided. I do not see much effort - any, really - put into detailing the motivation for this move, neither in the PR description nor in the JBS issue text. I just read through the whole PR discussion and really the only helpful comment I found was from Kim ( https://github.com/openjdk/jdk/pull/14988#issuecomment-1858136247 ). I think important decisions like enforcing C++17 would benefit from a more careful preparation. - PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1946628523
Re: RFR: 8314488: Compile the JDK as C++17 [v6]
On Thu, 15 Feb 2024 15:54:56 GMT, Magnus Ihse Bursie wrote: > > I would like it if toolchain version bumps were discussed somewhere else > > first, not in a PR. (And apologies if it was and I missed that discussion). > > Yes, it definitely was. I posted a separate [mail to > build-dev](https://mail.openjdk.org/pipermail/build-dev/2024-January/042802.html) > with subject "Raising the minimum version of gcc, clang and xlc". I don't > think we can make it more clear than that. Okay, thank you for that clarification. I clearly missed it then. - PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1946396577
Re: RFR: 8314488: Compile the JDK as C++17 [v6]
On Thu, 11 Jan 2024 13:23:45 GMT, Julian Waters wrote: >> Compile the JDK as C++17, enabling the use of all C++17 language features > > Julian Waters has updated the pull request incrementally with one additional > commit since the last revision: > > Require clang 13 in toolchain.m4 Just on a general note: I would like it if toolchain version bumps were discussed somewhere else first, not in a PR. (And apologies if it was and I missed that discussion). Because PRs are usually followed only by active developers, but toolchain versions affect a broader part of the community. As we have seen in this PR, there are conflicting interests. We have things like CSRs and JEPs. Both are not ideal - a JEP is way too massive, and a CSR is about the compatibility of the product, not about build-time. Still, maybe something like a CSR would make sense here. I understand that with toolchain support, there will always be opposing parties, and I can see arguments on both sides. We don't want to end up like FreeBSD - stuck in time because of opinion stalemates. I just keep thinking that a PR is maybe not the perfect forum for this. - PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1946053820
Re: RFR: 8325878: Require minimum Clang version 13
On Thu, 15 Feb 2024 05:19:45 GMT, Kim Barrett wrote: > Please review this change that updates the minimum supported version of Clang > to be used for building OpenJDK from 3.5 to 13. > > This permits enabling C++17 (JDK-8314488), though Clang 5 might suffice for > that. A minimum of Clang 13 also obtains a critical bug fix for the > [[noreturn]] > attribute (see JDK-8303805). > > Testing: mach5 tier1, which includes building with a recent version of > Xcode/clang. Unfortunately this will break my workflow on Linux - I use clang to build on Ubuntu 20.04, which is not that old, but it ships with clang 12. This is not a deal breaker, just annoying. - PR Comment: https://git.openjdk.org/jdk/pull/17862#issuecomment-1946030283
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]
On Mon, 12 Feb 2024 18:04:21 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > > Remove not matched trailing whitespaces Hi, some remarks: - there is no need for a copy for the first call to dll_load_library. Just hand in the string 1:1. - I would only do the *.a fallback loading if the error indicates that the *.so file had not been there. So, only if EACCESS or ENOENT; in all other cases I would not do the fallback. E.g. if the *.so file cannot be loaded due to a header mismatch. See https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines - Please use os::strdup. - Please assert that the replacement string is smaller than the original string (which it should be, *.so is longer than *.a, but this is insurance against anyone changing the code in the future) Thank you, Thomas - Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1882051726
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v13]
On Wed, 31 Jan 2024 13:17:21 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > > spelling I'm busy with FOSDEM this week and probably next. Will look at this end of next week or the week after. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1921004805
Re: RFR: 8307977: jcmd and jstack broken for target processes running with elevated capabilities
On Tue, 30 Jan 2024 10:47:22 GMT, Sebastian Lövdahl wrote: > 8307977: jcmd and jstack broken for target processes running with elevated > capabilities ping @jerboaa - PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1916676356
Re: RFR: 8324659: GHA: Generic jtreg errors are not reported
On Thu, 25 Jan 2024 12:02:35 GMT, Aleksey Shipilev wrote: > When looking at [JDK-8324647](https://bugs.openjdk.org/browse/JDK-8324647), I > was surprised to see that GHA has this output: > > > Running test 'jtreg:test/lib-test:tier1' > /home/runner/work/jdk/jdk/test/lib-test/TEST.groups: group all: group > includes itself > Error: One or more groups are invalid > Finished running test 'jtreg:test/lib-test:tier1' > Test report is stored in > build/run-test-prebuilt/test-results/jtreg_test_lib_test_tier1 > > == > Test summary > == >TEST TOTAL PASS FAIL ERROR >>> jtreg:test/lib-test:tier1 1 0 0 1 << > == > TEST FAILURE > > > ...and yet, the tests are recorded as "green"! I think this is because our > error reporting code is tad buggy. AFAICS, `make test-prebuilt` does not > report a non-zero exit code on failure, and the external script looks for > `build/run-test-prebuilt/make-support/exit-with-error` to check for error. > But it does not _set_ the exit code on its own when the failure is > discovered, neither it sets `failure=true` on generic failure. > > I think at very minimum we should report `failure=true` on generic failure. I > thought about reporting non-zero exit code from the parsing script, but it > does not feel clean to return non-zero exit code from the _parsing script_. > Non-zero exit code from that script should signify the error in the script > itself. > > Additional testing: > - [x] Current GHA with lib-test failure (now red) > - [x] Current GHA with lib-test fix (now green) > - [x] GHA with artificial errors: > https://github.com/shipilev/jdk/actions/runs/7652797646/job/20854472816 Ouch. Wow. +1. Is this problem also in older releases? - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17568#pullrequestreview-1845283123
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v9]
On Thu, 25 Jan 2024 11:04:03 GMT, Suchismith Roy wrote: > > For me the unresolved question is still: > > > > * do we want an unconditional load of *.a for a given *.so (have yet to see > > any documentation for this a-file duality) > > Yes. The documentation link - > https://www.ibm.com/docs/en/aix/7.3?topic=memory-shared-objects-run-time-linking > The text **In dynamic mode, input files specified with the -l flag may end > in .so, as well as in .a. That is, a reference to -lfoo is satisfied by the > first libfoo.so or libfoo.a found in any of the directories being searched. > Dynamic mode is in effect by default unless the -bstatic option is used.** > > https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command > > Archive files are composite objects, which usually contain import files and > object files, including shared objects. If an archive file contains another > archive file or a member whose type is not recognized, the ld command issues > a warning and ignores the unrecognized member. If an object file contained in > an archive file has the F_LOADONLY bit set in the XCOFF header, the ld > command ignores the member. This bit is usually used to designate old > versions of shared objects that remain in the archive file to allow existing > applications to load and run. New applications link with the new version of > the shared object, that is, another member of the archive. Excellent, thank you. > > > * if we do, do we want that to be bidirectional? Someone specifies *.a, do > > we want to attempt to load *.so? > > Considering the different scenarios, loading .a after .so failure should > suffice. I got a chance to look at the right file in OpenJ9-omr ,which has a > native code which does an attempt to load archive files after trying to load > .so files. This code was always there and it explains why the issue did not > occur in Semeru, which is derived from this repository. Okay. We don't have to be better than J9 then. If they do it, we should too. So, for the following input, we do: "library.so" -> load "library.so", then "library.a" "library"-> load "library.so", then "library.a" ? "library.a" -> only load "library.a" ? (*) > > > When in doubt, we should just mimic what OpenJ9 is doing on AIX. But I > > would like a clear documentation as a comment in os_aix.cpp explaining the > > logic and referencing the relevant OpenJ9 files. > > Any example comment you can refer ? I mean i just mention the file name in > OpenJ9 and explain the logic ? Let me know for any further clarifications Just reference the excerpts you mentioned above, then describe your intended logic. Example: "When loading .so, upon failure we attempt to load . When loading a library given without extension, ..." Explaining the logic makes it easy to see for the casual code reader what your intent is, that you have thought of all cases (*), and makes it possible to check the coding against your intent. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1911652995
Re: jmx-dev RFR: JDK-8324637: [aix] Implement support for reporting swap space in jdk.management
On Thu, 25 Jan 2024 12:30:15 GMT, Matthias Baesken wrote: > The get_total_or_available_swap_space_size coding misses AIX support, we only > return 0. This should be enhanced. > The perfstat API can be used, see > https://www.ibm.com/docs/pt/aix/7.2?topic=interfaces-perfstat-memory-total-interface > . Small nit, otherwise good. src/jdk.management/unix/native/libmanagement_ext/OperatingSystemImpl.c line 113: > 111: throw_internal_error(env, "perfstat_memory_total failed"); > 112: } > 113: return available ? (jlong)(memory_info.pgsp_free * 4L * 1024L) : > (jlong)(memory_info.pgsp_total * 4L * 1024L); Do we need the cast? perfstat_memory_total_t members are all 64-bit, no? Also, can we shorten this to: return (available ? memory_info.pgsp_free : memory_info.pgsp_total) * 4096; - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17569#pullrequestreview-1843868729 PR Review Comment: https://git.openjdk.org/jdk/pull/17569#discussion_r1466454083
Re: RFR: JDK-8324637: [aix] Implement support for reporting swap space in jdk.management
On Thu, 25 Jan 2024 12:30:15 GMT, Matthias Baesken wrote: > The get_total_or_available_swap_space_size coding misses AIX support, we only > return 0. This should be enhanced. > The perfstat API can be used, see > https://www.ibm.com/docs/pt/aix/7.2?topic=interfaces-perfstat-memory-total-interface > . Small nit, otherwise good. src/jdk.management/unix/native/libmanagement_ext/OperatingSystemImpl.c line 113: > 111: throw_internal_error(env, "perfstat_memory_total failed"); > 112: } > 113: return available ? (jlong)(memory_info.pgsp_free * 4L * 1024L) : > (jlong)(memory_info.pgsp_total * 4L * 1024L); Do we need the cast? perfstat_memory_total_t members are all 64-bit, no? Also, can we shorten this to: return (available ? memory_info.pgsp_free : memory_info.pgsp_total) * 4096; - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17569#pullrequestreview-1843868729 PR Review Comment: https://git.openjdk.org/jdk/pull/17569#discussion_r1466454083
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v9]
On Tue, 16 Jan 2024 08:36:49 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with three > additional commits since the last revision: > > - Update porting_aix.cpp > - Update porting_aix.cpp > - Update os_aix.cpp For me the unresolved question is still: - do we want an unconditional load of *.a for a given *.so (have yet to see any documentation for this a-file duality) - if we do, do we want that to be bidirectional? Someone specifies *.a, do we want to attempt to load *.so? When in doubt, we should just mimic what OpenJ9 is doing on AIX. But I would like a clear documentation as a comment in os_aix.cpp explaining the logic and referencing the relevant OpenJ9 files. There are some minor issues with the code itself, but I will defer reviewing. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1907550400
Re: RFR: JDK-8323330: [BACKOUT] JDK-8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows
On Wed, 10 Jan 2024 09:18:53 GMT, Matthias Baesken wrote: > There have been concerns raised about > [JDK-8276809](https://bugs.openjdk.org/browse/JDK-8276809) , so bring back > the old behavior. okay - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17341#pullrequestreview-1814102655
Re: RFR: JDK-8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows
On Tue, 2 Jan 2024 15:03:12 GMT, Matthias Baesken wrote: > The new test java/awt/font/JNICheck/FreeTypeScalerJNICheck.java introduced > with https://bugs.openjdk.java.net/browse/JDK-8269223 adds -Xcheck:jni , and > shows on Windows server 2019 the following JNI warning , so the test fails on > this Windows version. > > stdout: [WARNING in native method: JNI call made without checking exceptions > when required to from CallStaticVoidMethodV > at > sun.awt.Win32GraphicsEnvironment.initDisplay(java.desktop@23-internal/Native > Method) > at > sun.awt.Win32GraphicsEnvironment.initDisplayWrapper(java.desktop@23-internal/Win32GraphicsEnvironment.java:95) > at > sun.awt.Win32GraphicsEnvironment.(java.desktop@23-internal/Win32GraphicsEnvironment.java:63) > at > sun.awt.PlatformGraphicsInfo.createGE(java.desktop@23-internal/PlatformGraphicsInfo.java:34) > at > java.awt.GraphicsEnvironment$LocalGE.createGE(java.desktop@23-internal/GraphicsEnvironment.java:93) > at > java.awt.GraphicsEnvironment$LocalGE.(java.desktop@23-internal/GraphicsEnvironment.java:84) > at > java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(java.desktop@23-internal/GraphicsEnvironment.java:106) > at FreeTypeScalerJNICheck.runTest(FreeTypeScalerJNICheck.java:53) > at FreeTypeScalerJNICheck.main(FreeTypeScalerJNICheck.java:44) > > > We better add an exception check to get rid of the JNI warning (and also of > the test failure) . Hi Phil, sorry for this, this was not bad intent, just a plain mistake. It is somewhat difficult to remember the exact review rules per project. Help from Skara would certainly be useful here. Cheers, Thomas - PR Comment: https://git.openjdk.org/jdk/pull/17224#issuecomment-1883471813
Re: RFR: JDK-8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows
On Tue, 2 Jan 2024 15:03:12 GMT, Matthias Baesken wrote: > The new test java/awt/font/JNICheck/FreeTypeScalerJNICheck.java introduced > with https://bugs.openjdk.java.net/browse/JDK-8269223 adds -Xcheck:jni , and > shows on Windows server 2019 the following JNI warning , so the test fails on > this Windows version. > > stdout: [WARNING in native method: JNI call made without checking exceptions > when required to from CallStaticVoidMethodV > at > sun.awt.Win32GraphicsEnvironment.initDisplay(java.desktop@23-internal/Native > Method) > at > sun.awt.Win32GraphicsEnvironment.initDisplayWrapper(java.desktop@23-internal/Win32GraphicsEnvironment.java:95) > at > sun.awt.Win32GraphicsEnvironment.(java.desktop@23-internal/Win32GraphicsEnvironment.java:63) > at > sun.awt.PlatformGraphicsInfo.createGE(java.desktop@23-internal/PlatformGraphicsInfo.java:34) > at > java.awt.GraphicsEnvironment$LocalGE.createGE(java.desktop@23-internal/GraphicsEnvironment.java:93) > at > java.awt.GraphicsEnvironment$LocalGE.(java.desktop@23-internal/GraphicsEnvironment.java:84) > at > java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(java.desktop@23-internal/GraphicsEnvironment.java:106) > at FreeTypeScalerJNICheck.runTest(FreeTypeScalerJNICheck.java:53) > at FreeTypeScalerJNICheck.main(FreeTypeScalerJNICheck.java:44) > > > We better add an exception check to get rid of the JNI warning (and also of > the test failure) . +1. I agree with Ralf, logging is not that important. Possibly just add a (c-runtime) assert. Up to you. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17224#pullrequestreview-1808867637
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v10]
On Fri, 22 Dec 2023 14:50:16 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > additional fix of sideeffect reported in JDK-8322691 src/hotspot/os/aix/porting_aix.cpp line 1071: > 1069: if (max_handletable == 0) { > 1070: // First time we allocate memory for 128 Entries > 1071: char* ptmp = (char*)::malloc(128 * sizeof(struct > handletableentry)); No need for malloc. You can start with realloc, since realloc(NULL, ...) is malloc. static handletablentry* tab = nullptr; static unsigned max_handles = 0; ... if (need more handles) unsigned new_max = MAX2(max_handles * 2, init_num_handles); handleentry* new_tab = ::realloc(p_handletable, sizeof(handleentry) * new_max); if (new_tab != nullptr) { max_handles = new_max; tab= new_tab; } - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1435139923
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Thu, 21 Dec 2023 11:54:17 GMT, Martin Doerr wrote: >> Dynamic allocation also opens us up to potential initialization issues, >> unless we explicitly use raw ::malloc. It should work, but I think its >> better avoided unless we really need it. > > Well we're fixing an academic issue by introducing another one? Doesn't make > sense to me. Okay, I butt out, I don't care enough. Up to you both to decide what to do. My recommendation would still be to avoid hotspot infrastructure that relies on os::malloc and friends; other than that, rewriting this table to make it growable using realloc should be trivial. Note that we need *some* sort of limit though. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433996779
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Thu, 21 Dec 2023 11:23:46 GMT, Joachim Kern wrote: >> I don't like introducing unnecessary limitations. Are we sure nobody will >> ever need more than 1024 handles? >> Can't we at least use a GrowableArray or something like that? > > In principle you are right, but in my opinion 1024 is an academical limit. I > never saw processes with more than a few dozen loaded libraries. Dynamic allocation also opens us up to potential initialization issues, unless we explicitly use raw ::malloc. It should work, but I think its better avoided unless we really need it. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433963889
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v7]
On Thu, 21 Dec 2023 09:37:55 GMT, Suchismith Roy wrote: > > > > What happens if we accidentally attempt to load a "real" static > > > > library, which is also named *.a? Would dlopen() then crash? What would > > > > happen? > > > > > > > I don't think the problem is with *.a . They would load as the default > > > behaviour of the dlopen. It is only when the dlopen fails for *.so , we > > > give another chance to check for .a file with the same name. > > > > > > No, what I meant, and what must be clarified before going forward with this > > solution, is the following: > > > > * is _every_ `*.a` object on AIX loadable with `dlopen`, and will the > > result be the same as when loading a `*.so` object > > * or, if we present arbitrary `*.a` files to dlopen, is there a chance for > > dlopen to crash or misbehave. > > > > Reason is that I was under the impression that *.a libraries are static > > libraries and cannot be loaded dynamically. This is what you now try to do. > > If we cannot safely answer this question, I would opt for a more narrow > > solution by hard-wiring known alternative names. So, do the second *.a > > attempt only for your `ibm_16_am.a` which you know works. That could also > > be done in a reasonably maintainable manner. > > In AIX, both static and dynamic libraries have *.a extension. And AIX also > supports *.so files.Bascially shared objects in AIX have both *.a and *.so > extension. Hence we need to implement this logic. If we try loading a static > archive specifically ,how the dlopen would behave , that is something > probably @JoKern65 can answer ? Rather, this is a question you have to ask your collegues at IBM that develop the AIX libc. Since AIX libc is not open source, we cannot look for ourselves, nor can Joachim (her works at SAP). > > > > > Does this really have to be handled in the OpenJDK? What does J9 on AIX > > > > do? Could this be done in a simpler way outside OpenJDK, e.g. by > > > > providing an *.so variant of the library in question? Where does this > > > > library come from? > > > > > > > I am not sure how J9 handles this. I would have to consult . > > > > > > J9 is Open Source, can't you just look? :) > > I did try comparing the file structures, and i do not see a similar file > structure over there. I am unable to find the jvmTiAgent code and also os_aix > file. So i am not sure which functions over there are doing the same > functionality. You have any suggestion on how i can check and correlate ? Someone must implement LoadLibrary. Try looking for places where dlopen() is called. > > > > However as per current observation, this issue does not show up on > > > Semuru. This issue is only happening on Adoptium. The team that release > > > these file has always released *.a files which work fine for Semuru. > > > > > > I don't know what Semuru is. What is the context, is that a different VM? > > Also OpenJDK? J9 derived? > > Semuru is J9 derived. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1865977132
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Thu, 21 Dec 2023 09:37:57 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/porting_aix.cpp line 916: >> >>> 914: constexpr int max_handletable = 1024; >>> 915: static int g_handletable_used = 0; >>> 916: static struct handletableentry g_handletable[max_handletable] = {{0, >>> 0, 0, 0}}; >> >> Wouldn't `ConcurrentHashTable` be a better data structure? It is already >> used in hotspot, can grow dynamically and doesn't need linear search. > > There will be only few libraries in the list. With this assumption Thomas > suggested to use just a simple array. Let's keep it simple. A linear array of only a few items is easily scanned, probably faster than pointer hopping hash table entries. Not that it matters in any way for the few calls to dlopen. Also, avoiding hotspot structures preserves layer integrity (porting_aix does not pull anything from hotspot so far) and prevents initialisation time dependencies. Not sure whether ConcurrentHashTable works before VM init, but with Joachimes current solution, we can call dlopen at any time in VM life. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433839119
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v7]
On Thu, 21 Dec 2023 08:16:22 GMT, Suchismith Roy wrote: >> What happens if we accidentally attempt to load a "real" static library, >> which is also named *.a? Would dlopen() then crash? What would happen? > I don't think the problem is with *.a . They would load as the default > behaviour of the dlopen. It is only when the dlopen fails for *.so , we give > another chance to check for .a file with the same name. No, what I meant, and what must be clarified before going forward with this solution, is the following: - is *every* `*.a` object on AIX loadable with `dlopen`, and will the result be the same as when loading a `*.so` object - or, if we present arbitrary `*.a` files to dlopen, is there a chance for dlopen to crash or misbehave. Reason is that I was under the impression that *.a libraries are static libraries and cannot be loaded dynamically. This is what you now try to do. If we cannot safely answer this question, I would opt for a more narrow solution by hard-wiring known alternative names. So, do the second *.a attempt only for your `ibm_16_am.a` which you know works. That could also be done in a reasonably maintainable manner. >> Does this really have to be handled in the OpenJDK? What does J9 on AIX do? >> Could this be done in a simpler way outside OpenJDK, e.g. by providing an >> *.so variant of the library in question? Where does this library come from? > I am not sure how J9 handles this. I would have to consult . J9 is Open Source, can't you just look? :) > However as per current observation, this issue does not show up on Semuru. > This issue is only happening on Adoptium. The team that release these file > has always released *.a files which work fine for Semuru. I don't know what Semuru is. What is the context, is that a different VM? Also OpenJDK? J9 derived? - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1865859345
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Wed, 20 Dec 2023 14:53:06 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > improve error handling still ok, small nit inside src/hotspot/os/aix/porting_aix.cpp line 1033: > 1031: // filled by os::dll_load(). This way we mimic dl handle equality for a > library > 1032: // opened a second time, as it is implemented on other platforms. > 1033: void* Aix_dlopen(const char* filename, int Flags, const char** > error_report) { add assert for error_report != nullptr - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1792301031 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433606032
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v7]
On Wed, 20 Dec 2023 11:16:03 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > > Spaces fix Hi, some requests and questions: - Please modify the JBS title, PR title, and JBS issue text to reflect that this adds an alternative shared object loading path for shared objects on AIX. Something like "Allow loading shared objects with .a extension on AIX". Please describe the new logic in the JBS issue text. - Does this really have to be handled in the OpenJDK? What does J9 on AIX do? Could this be done in a simpler way outside OpenJDK, e.g. by providing an *.so variant of the library in question? Where does this library come from? - What happens if we accidentally attempt to load a "real" static library, which is also named *.a? Would dlopen() then crash? What would happen? - What happens if the original path handed to os::dll_load is already a *.a file? Should the logic then be reversed? - We really need regression tests for this. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1864572287
Re: RFR: 8314488: Compile the JDK as C++17 [v2]
On Fri, 15 Dec 2023 16:20:02 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains five additional >> commits since the last revision: >> >> - Merge branch 'openjdk:master' into patch-7 >> - Revert vm_version_linux_riscv.cpp >> - vm_version_linux_riscv.cpp >> - allocation.cpp >> - 8310260 > > I agree that before throwing this switch, we need to look at some specific > issues that might need to be addressed, discuss the benefits, and also the > costs. > > As was discussed for the change to C++14, there is *never* a good time to > start introducing the use of new language features as far as backporting is > concerned, unless one is going to backport the language change too. We didn't > do that for C++14, and I don't think we are going to (nor should) do it for > C++17 either. But backporting concerns can't be all powerful, as that will > forever prevent potentially significant improvements. > > I started to make a list of new language features that seem particularly > beneficial or otherwise important. I was going to write style guide updates > for these, but haven't gotten very far with that yet. > > P0035R4: Dynamic memory allocation for over-aligned data > P0135R1: Guaranteed copy elision > P0145R3: Refining Expression Evaluation Order for Idiomatic C++ > P0292R2: constexpr if > P0091R3/P0512R0: Template argument deduction for class templates > > Here are some others that might be of interest to us. > N4268: Allow constant evaluation for all non-type template arguments > N3928: Extending static_assert > P0118R1: [[fallthrough]] attribute > P0189R1: [[nodiscard]] attribute > P0212R1: [[maybe_unused]] attribute > P0170R1: Wording for constexpr lambda > P0283R2: Ignoring unsupported non-standard attributes > P0061R1: __has_include for C++17 > P0386R2: Inline variables @kimbarrett > P0035R4: Dynamic memory allocation for over-aligned data Do we really need this? I ask because, in the end, this will result in something like `posix_memalign` to be called, and I remember it being notorious for causing large footprint overhead depending on how smart the underlying allocator is about using alignment waste. It will also be non-trivial to implement in hotspot since NMT uses malloc headers. Barring a rewrite of NMT malloc metadata tracking (e.g. using a hash map, which would be more costly both in terms of performance and, probably, footprint), malloc headers would have to be revised. Probably would need to be dynamic-sized. This is the reason we did not bother wrapping posix_memalign. - PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1864039399
Re: RFR: 8322417: Console read line with zero out should zero out when throwing exception
On Tue, 19 Dec 2023 12:47:53 GMT, Goetz Lindenmaier wrote: > …g exception > > After leaving the method by throwing an exception the data can not be cleaned > any more. Seems reasonable. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17156#pullrequestreview-1789380436
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Tue, 19 Dec 2023 12:37:33 GMT, Suchismith Roy wrote: >> The libpath parsing code is from me, so no license problems. > > Hi @JoKern65 Is this good to integrate now ? @suchismith1993 > Once this code goes in I can push in my changes. We are targeting the fix for > January. If you talk about https://github.com/openjdk/jdk/pull/16604, no, you cannot push that even if Joachim finishes his work. Your patch has not even a single review, is quite controversial, and none of the issues the reviewers have mentioned are addressed. This needs a lot more discussion time. - PR Comment: https://git.openjdk.org/jdk/pull/16920#issuecomment-1862704694
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v6]
On Mon, 18 Dec 2023 13:33:46 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Followed Thomas proposals > > Well done. > > Releasing the mutex before asserting is not necessary; we don't pull the > handle table lock as part of error reporting. > @tstuefe Sorry to tag you. Can you review the code. Once this code goes in I > can push in my changes. We are targeting the fix for January. > Hi @JoKern65 Is this good to integrate now ? @suchismith1993 Please don't put pressure on patch authors and developers. There is zero reason why this patch should be rushed. > Hi @suchismith1993, I'm waiting for a second review. Complex hotspot changes > should be reviewed twice. Not only that, hotspot changes *need* to be reviewed by at least two reviewers. That is not optional. See OpenJDK bylaws. - PR Comment: https://git.openjdk.org/jdk/pull/16920#issuecomment-1862695052
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Mon, 18 Dec 2023 11:12:23 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/porting_aix.cpp line 1097: >> >>> 1095: } >>> 1096: >>> 1097: pthread_mutex_lock(_handletable_mutex); >> >> You can make your life a lot easier by defining an RAII object at the start >> of the file: >> >> struct TableLocker { >> TableLocker() { pthread_mutex_lock(_handletable_mutex); } >> ~TableLocker() { pthread_mutex_unlock(_handletable_mutex); } >> }; >> >> and just place this at the beginning of your two functions >> >> TableLocker lock: >> ... >> >> >> no need to manually unlock then, with the danger of missing a return. > > Great, thank you. This was one of the things I thought about, but was not > sure, because I did not fully understood the MutexLocker class and the > difference between Monitor and Mutex. In hindsight, pthread mutex is the better choice anyway: it avoids dependencies to the VM lifecycle (VM mutexes are only available after VM initialization, so we could not call dlopen() before that). - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1430380082
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Mon, 18 Dec 2023 10:35:48 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/porting_aix.cpp line 990: >> >>> 988: if (env == nullptr) { >>> 989: // no LIBPATH, try with LD_LIBRARY_PATH >>> 990: env = getenv("LD_LIBRARY_PATH"); >> >> Is LD_LIBRARY_PATH a thing on AIX? I thought it is only used on non-AIX. > > Yes it is, It's the fallback if LIBPATH is not defined In that case there may be errors in other places, since so far we assumed its either one or the other, but not both. Example: https://github.com/openjdk/jdk/blob/a247d0c74bea50f11d24fb5f3576947c6901e567/src/java.base/unix/native/libjli/java_md.c#L43C1-L47 Maybe you need to take a look here, in case LD_LIBRARYPATH needs to be handled in addition to LIBPATH? - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429917901
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v6]
On Mon, 18 Dec 2023 11:30:59 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > Followed Thomas proposals Well done. Releasing the mutex before asserting is not necessary; we don't pull the handle table lock as part of error reporting. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1786905733
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Mon, 18 Dec 2023 10:06:34 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - trailing whitespace >> - Following most of Thomas proposals > > src/hotspot/os/aix/porting_aix.cpp line 934: > >> 932: struct scnhdr the_scn; >> 933: struct ldhdr the_ldr; >> 934: size_t sz = FILHSZ + _AOUTHSZ_EXEC; > > please rename to xcoffsz, and make constexpr: `constexpr size_t xcoffsz = ...` Also, can you please add STATIC_ASSERT(sizeof(the_xcoff) == xcoffsz); STATIC_ASSERT(sizeof(the_scn) == SCNHSZ); STATIC_ASSERT(sizeof(the_ldr) == LDHDRSZ); - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429853292
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Fri, 15 Dec 2023 11:57:51 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with two additional > commits since the last revision: > > - trailing whitespace > - Following most of Thomas proposals I like this, this is good. Small nits remain. src/hotspot/os/aix/os_aix.cpp line 30: > 28: #pragma alloca > 29: > 30: please remove whitespace change src/hotspot/os/aix/os_aix.cpp line 193: > 191: // local variables > 192: > 193: please remove whitespace change src/hotspot/os/aix/os_aix.cpp line 1113: > : } > 1112: > 1113: please remove whitespace change src/hotspot/os/aix/porting_aix.cpp line 934: > 932: struct scnhdr the_scn; > 933: struct ldhdr the_ldr; > 934: size_t sz = FILHSZ + _AOUTHSZ_EXEC; please rename to xcoffsz, and make constexpr: `constexpr size_t xcoffsz = ...` src/hotspot/os/aix/porting_aix.cpp line 990: > 988: if (env == nullptr) { > 989: // no LIBPATH, try with LD_LIBRARY_PATH > 990: env = getenv("LD_LIBRARY_PATH"); Is LD_LIBRARY_PATH a thing on AIX? I thought it is only used on non-AIX. src/hotspot/os/aix/porting_aix.cpp line 1005: > 1003: // LIBPATH or LD_LIBRARY_PATH and second with burned in libpath. > 1004: // No check against current working directory > 1005: Libpath.print("%s:%s", env, rtv_linkedin_libpath()); Are you sure libpath env var has precedence over the baked-in libpath? src/hotspot/os/aix/porting_aix.cpp line 1097: > 1095: } > 1096: > 1097: pthread_mutex_lock(_handletable_mutex); You can make your life a lot easier by defining an RAII object at the start of the file: struct TableLocker { TableLocker() { pthread_mutex_lock(_handletable_mutex); } ~TableLocker() { pthread_mutex_unlock(_handletable_mutex); } }; and just place this at the beginning of your two functions TableLocker lock: ... no need to manually unlock then, with the danger of missing a return. src/hotspot/os/aix/porting_aix.cpp line 1101: > 1099: for (i = 0; i < g_handletable_used; i++) { > 1100: if (g_handletable[i].handle == libhandle) { > 1101: // handle found, decrease refcount `assert(refcount > 0, "Sanity"))` src/hotspot/os/aix/porting_aix.cpp line 1143: > 1141: // entry of the array to the place of the entry we want to remove > and overwrite it > 1142: if (i < g_handletable_used) { > 1143: g_handletable[i] = g_handletable[g_handletable_used]; To be super careful, I would zero out at least the handle of the moved item like this: `g_handletable[g_handletable_used].handle = nullptr` - Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1786400492 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870755 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870833 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870885 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429849403 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429858465 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429859923 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429868182 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429863665 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870057
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Fri, 15 Dec 2023 10:18:53 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/os_aix.cpp line 206: >> >>> 204: constexpr int max_handletable = 1024; >>> 205: static int g_handletable_used = 0; >>> 206: static struct handletableentry g_handletable[max_handletable] = {{0, >>> 0, 0, 0}}; >> >> I would move all that new and clearly delineated dlopen stuff into an own >> file, e.g. dlopen_aix.cpp or porting_aix.cpp (in porting_aix.cpp, we already >> have wrappers for other functions). os_aix.cpp is already massive. > > I moved the static variable declarations and the functions `Aix_dlopen(), > search_file_in_LIBPATH(), rtv_linkedin_libpath()` and `os::pd_dll_unload()` > to porting_aix.cpp. This links, but in my opinion `os::pd_dll_unload()` > should reside in os_aix.cpp, because it is member of the os class. But there > it will not compile anymore if the static variables are moved away. No, what I meant was to provide a "libc-like" equivalent for dlopen, similar to what we do with dladdr (see https://github.com/openjdk/jdk/blob/b7676822886eac21f61ff361a32928a966d8fe31/src/hotspot/os/aix/porting_aix.cpp#L306). But never mind; I am also fine with moving os::pd_dlopen into a different cpp file, e.g. "dlopen_aix.cpp". Just move it out of os_aix.cpp, since that is already massive and you add >300 lines of more code and more dependencies. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427812795
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Fri, 15 Dec 2023 09:57:19 GMT, Joachim Kern wrote: > If we omit the xcoff32 we have to ensure that no xcoff32 executable file > comes into play. xcoff32 is for 32-bit binaries. The AIX port only exists for 64-bit, and there will never be a 32-bit AIX port, so there is no reason for handling 32-bit xcoff headers. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427803763
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Fri, 15 Dec 2023 06:22:39 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> followed the proposals > > src/hotspot/os/aix/os_aix.cpp line 1129: > >> 1127: >> 1128: // get the library search path burned in to the executable file during >> linking >> 1129: // If the libpath cannot be retrieved return an empty path > > This is new. Is this complexity needed, if yes, why? Don't see a comment, may > have missed it. Also, why are we parsing xcoff32 headers in there? AIX OpenJDK will always be 64-bit. So, you can replace the whole xcoff32 section with assert( f_magic == U802TOCMAGIC, ..). The function becomes a lot simpler then. > src/hotspot/os/aix/os_aix.cpp line 1132: > >> 1130: static const char* rtv_linkedin_libpath() { >> 1131: static char buffer[4096]; >> 1132: static const char* libpath = 0; > > If your intent is to return an empty buffer if there is no contained libpath, > I would just: > > > static const char* libpath = ""; > > then you can always just return libpath. But looking at the using code, returning NULL in case there is no contained libpath would be actually easier, see below. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427609926 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427639138
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Tue, 12 Dec 2023 14:05:48 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > followed the proposals Is this libpath parsing code copied from the R3 kernel? If yes, pls make sure there are no licensing issues. src/hotspot/os/aix/os_aix.cpp line 206: > 204: constexpr int max_handletable = 1024; > 205: static int g_handletable_used = 0; > 206: static struct handletableentry g_handletable[max_handletable] = {{0, 0, > 0, 0}}; I would move all that new and clearly delineated dlopen stuff into an own file, e.g. dlopen_aix.cpp or porting_aix.cpp (in porting_aix.cpp, we already have wrappers for other functions). os_aix.cpp is already massive. src/hotspot/os/aix/os_aix.cpp line 1129: > 1127: > 1128: // get the library search path burned in to the executable file during > linking > 1129: // If the libpath cannot be retrieved return an empty path This is new. Is this complexity needed, if yes, why? Don't see a comment, may have missed it. src/hotspot/os/aix/os_aix.cpp line 1131: > 1129: // If the libpath cannot be retrieved return an empty path > 1130: static const char* rtv_linkedin_libpath() { > 1131: static char buffer[4096]; This coding has some issues: - a generic char buffer is not a good idea. Forces you to do casts all over the place, and introduces alignment issues with unaligned char buffer. Which I assume is the reason for all the separate memcpy-into-structures below. I would just read into the structures directly. - you need to check the return codes for fread to make sure you read the number of bytes expected, lest you work with uninitialized memory and maybe to handle sporadic EINTR. - I don't get all the separate "SZ" macros. They must be equal to sizeof(structure), right, otherwise you get buffer overruns or work with uninitialized memory? Proposal: add a local wrapper function like this: template static bool my_checked_fread(FILE* f, T* out) { // read sizeof(T) from f. // Check return code. // Return bool if sizeof(T) bytes were read. e.g. in a very trivial form: int bytesread = fread(out, sizeof(T), 1, f); return bytesread == sizeof(T); } and use it in your code like this: struct xcoff64 the_xcoff64; struct scn64 the_scn64; struct ldr64 the_ldr64; if (!my_checked_fread(f, _xcoff64)) { assert? } ... if (!my_checked_fread(f, _ldr64) { .. handle error } src/hotspot/os/aix/os_aix.cpp line 1132: > 1130: static const char* rtv_linkedin_libpath() { > 1131: static char buffer[4096]; > 1132: static const char* libpath = 0; If your intent is to return an empty buffer if there is no contained libpath, I would just: static const char* libpath = ""; then you can always just return libpath. src/hotspot/os/aix/os_aix.cpp line 1135: > 1133: > 1134: if (libpath) > 1135: return libpath; { } src/hotspot/os/aix/os_aix.cpp line 1137: > 1135: return libpath; > 1136: > 1137: char pgmpath[32+1]; Will overflow if pid_t is 64bit. Give it a larger size; after all, you are giving buffer 4K above, so you are not overly concerned with saving stack space. src/hotspot/os/aix/os_aix.cpp line 1146: > 1144: fread(buffer, 1, FILHSZ_64 + _AOUTHSZ_EXEC_64, f); > 1145: > 1146: if (((struct filehdr*)buffer)->f_magic == U802TOCMAGIC ) { as stated above, I don't think this section is needed. src/hotspot/os/aix/os_aix.cpp line 1170: >
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]
On Tue, 5 Dec 2023 13:21:35 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> encapsulate everything in os::Aix::dlopen > > src/hotspot/os/aix/os_aix.cpp line 3133: > >> 3131: return nullptr; >> 3132: } >> 3133: // library not still loaded and still place in array, so load >> library > > s/still/yet No need to be this verbose either, especially since the comment is somewhat misleading. "create entry at end of table" implies that we have a dynamically growing table and allocate new entries. Proposal: "Library not yet loaded; load it, then store its handle in handle table". - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415605856
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v2]
On Mon, 4 Dec 2023 12:33:26 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > improve handling of nonexisting files src/hotspot/os/aix/os_aix.cpp line 203: > 201: constexpr int max_handletable = 1024; > 202: static int g_handletable_used = 0; > 203: static struct handletableentry g_handletable[max_handletable] = > {{0,0,0,0}}; style nits: - we usually write the * behind type, not before var name - `{{0,0}}` -> insert spaces src/hotspot/os/aix/os_aix.cpp line 1159: > 1157: result = ::dlopen(filename, dflags); > 1158: if (result != nullptr) { > 1159: assert(false, "dll_load: Could not stat() file %s, but dlopen() > worked; Have to improve stat()", filename); use assert(result != nullptr) and remove condition - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1413843503 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1413846111
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]
On Tue, 5 Dec 2023 12:11:46 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > encapsulate everything in os::Aix::dlopen Excellent, this is how I have pictured a good solution. Very nice. A number of remarks, but nothing fundamental. src/hotspot/os/aix/os_aix.cpp line 1137: > 1135: if (ebuf != nullptr && ebuflen > 0) { > 1136: ::strncpy(ebuf, "dll_load: empty filename specified", ebuflen - > 1); > 1137: } Are there any cases where we don't hand in the error buffer? If so, I would just assert ebuf and ebuflen. No need for this kind of flexibility. src/hotspot/os/aix/os_aix.cpp line 3051: > 3049: > 3050: // Simulate the library search algorithm of dlopen() (in os::dll_load) > 3051: int os::Aix::stat64x_via_LIBPATH(const char* path, struct stat64x* > stat) { - no need to export this, make it filescope static - please return bool, with false = error - please rename it to something like "search_file_in_LIBPATH" src/hotspot/os/aix/os_aix.cpp line 3055: > 3053: return -1; > 3054: > 3055: char *path2 = strdup (path); Please use os::strdup and os::free. If you really intent to use the plain libc versions, use `::strdup` and `::free` to make sure - and indicate to code readers - you use the global libc variants. src/hotspot/os/aix/os_aix.cpp line 3059: > 3057: int idx = strlen(path2) - 1; > 3058: if (path2[idx] == ')') { > 3059: while (path2[idx] != '(' && idx > 0) idx--; ? Why not `strrchr()`? src/hotspot/os/aix/os_aix.cpp line 3067: > 3065: if (path2[0] == '/' || > 3066: (path2[0] == '.' && (path2[1] == '/' || > 3067: (path2[1] == '.' && path2[2] == '/' { This complexity is not needed, nor is it sufficient, since it does not handle relative paths ("mydirectory/hallo.so") https://www.ibm.com/docs/en/aix/7.1?topic=d-dlopen-subroutine "If FilePath contains a slash character, FilePath is used directly, and no directories are searched. " So, just scan for a '/' - if you find one, its a path to be opened directly: const bool use_as_filepath = strchr(path2, '/'); src/hotspot/os/aix/os_aix.cpp line 3085: > 3083: strcpy(libpath, env); > 3084: for (token = strtok_r(libpath, ":", ); token != nullptr; > token = strtok_r(nullptr, ":", )) { > 3085: sprintf(combined, "%s/%s", token, path2); You can save a lot of pain and manual labor by using `stringStream` here. stringStream combined; combined.print("%s/%s", token, path2); const char* combined_path_string = combined.base(); no need for manual allocation and byte counting. src/hotspot/os/aix/os_aix.cpp line 3099: > 3097: // filled by os::dll_load(). This way we mimic dl handle equality for a > library > 3098: // opened a second time, as it is implemented on other platforms. > 3099: void* os::Aix::dlopen(const char* filename, int Flags) { Does not need to be exported, nor does os::AIX::dlclose. Make file scope static. See my remarks in os_posix.cpp. src/hotspot/os/aix/os_aix.cpp line 3103: > 3101: struct stat64x libstat; > 3102: > 3103: if (os::Aix::stat64x_via_LIBPATH(filename, )) { Please return bool, not unix int -1, this hurts my brain :-) src/hotspot/os/aix/os_aix.cpp line 3108: > 3106: if (result != nullptr) { > 3107: assert(false, "dll_load: Could not stat() file %s, but dlopen() > worked; Have to improve stat()", filename); > 3108:
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11]
On Mon, 4 Dec 2023 00:41:23 GMT, David Holmes wrote: > From the blog: > > > Yes! The methods are being deallocated for a class loader that is still > > alive. > > Okay so why does this happen and is it a reasonable thing to be happening? On > the surface it sounds wrong to deallocate anything associated with a live > classloader. This sounds odd to me to. I know that we deallocate the old *byte code* of re-transformed classes; but `Method*` ? - PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1838413238
Re: RFR: 8320935: Move CDS config initialization code to cdsConfig.cpp [v2]
On Sat, 2 Dec 2023 00:38:58 GMT, Ioi Lam wrote: >> This is a simple clean up that moves the code for initializing the CDS >> config states from arguments.cpp to cdsConfig.cpp >> >> I renamed a few functions, but otherwise the code is unchanged. >> >> - `get_default_shared_archive_path()` -> `default_archive_path()` >> - `GetSharedArchivePath()` -> `static_archive_path()` >> - `GetSharedDynamicArchivePath()` -> `dynamic_archive_path()` >> >> There's also less `#if INCLUDE_CDS` since the entire cdsConfig.cpp file is >> compiled only if CDS is enabled. > > Ioi Lam has updated the pull request incrementally with one additional commit > since the last revision: > > fixed indentation Looks good. Did not find any functional difference to the original code. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16868#pullrequestreview-1760796058
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11]
On Wed, 29 Nov 2023 11:49:31 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method >> redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` >> instance. Once that method gets redefined, the `jmethodID` is updated to >> point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original >> `jmethodID` will be redirected to the latest `Method` version and at the >> same time the 'previous' `Method` version will receive a new `jmethodID` >> pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or >> `GetAllStackTraces` JVMTI calls while this previous `Method` version is >> still on stack we will have the corresponding frame identified by a >> `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for >> cleanup at what time all contained `Method` instances. The cleanup process >> will not perform the `jmethodID` pointer maintenance and we will end up with >> pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to >> `ClassLoaderData` instance and all relevant `jmethodID`s will get >> batch-updated when the class loader is being released and all its classes >> are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being >> deallocate the associated `jmethodID` (if any) must not point to the >> deallocated instance once we are finished. Unfortunately, we can not just >> update the `jmethodID` values in bulk when purging an old class version - >> the per `InstanceKlass` jmethodID cache is present only for the main class >> version and contains `jmethodID` values for both the old and current method >> versions. >> >> ~Therefore we need to perform `jmethodID` lookup when we are about to >> deallocate a `Method` instance and clean up the pointer only if that >> `jmethodID` is pointing to the `Method` instance which is being deallocated.~ >> >> Therefore, we need to perform `jmethodID` lookup for each method in an old >> class version that is getting purged, and null out the pointer of that >> `jmethodID` to break the link from `jmethodID` to the method instance that >> is about to get deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my >> blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one > additional commit since the last revision: > > Restrict cleanup to obsolete methods only I won't be able to review this this week, too snowed in atm. I can take a look next week. We can always just revert the change if needed. Thinking about Skara, I think as long as we have this confusing mixture of rules (hotspot wants 2 reviewers that are Reviewer/Committer, but some jdk libs only want one, but then you need two for desktop I think otherwise Phil gets angry) - we should hard-code the 2-reviewer rule into skara as default since it affects the lion's share of all changes. - PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1835474161
Re: RFR: 8320921: GHA: Parallelize hotspot_compiler test jobs
On Tue, 28 Nov 2023 18:15:56 GMT, Aleksey Shipilev wrote: > In current GHA, `hotspot_compiler` testing takes a long time, and often takes > the longest. On MacOS and Windows it routinely takes 60..80 minutes, while > other test groups run in 30..40 minutes. This often drags the total wall > clock time to run the GHA up. Fortunately, there are already several parts in > `hotspot_compiler` test group, which we might just run separately: > https://github.com/openjdk/jdk/blob/dc256fbc6490f8163adb286dbb7380c10e5e1e06/test/hotspot/jtreg/TEST.groups#L142-L146 > > There is a bit of awkwardness about `not_xcomp` subgroup, which does not have > lots of tests. Instead of doing another short job, I think we can just meld > it into the quickest "part 2". > > > # Before > linux-x64 / test (hs/tier1 compiler) 38m 27s > windows-x64 / test (hs/tier1 compiler): 1h 17m 30s > macos-x64 / test (hs/tier1 compiler): 1h 2m 53s > > # After > linux-x64 / test (hs/tier1 compiler part 1): 11m 38s > linux-x64 / test (hs/tier1 compiler part 2): 9m 34s > linux-x64 / test (hs/tier1 compiler part 3): 21m 40s > > windows-x64 / test (hs/tier1 compiler part 1): 24m 19s > windows-x64 / test (hs/tier1 compiler part 2): 18m 54s > windows-x64 / test (hs/tier1 compiler part 3): 43m 32s > > macos-x64 / test (hs/tier1 compiler part 1): 26m 44s > macos-x64 / test (hs/tier1 compiler part 2): 20m 40s > macos-x64 / test (hs/tier1 compiler part 3): 42m 10s > > > Additional testing: > - [x] GHA (current run is in progress, but I had clean runs before) Good! - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16860#pullrequestreview-1755113396
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Tue, 28 Nov 2023 11:27:33 GMT, suchismith1993 wrote: > > > > @tstuefe 3rd parameter to pass the either of 2 things: > > 1. The JvmTiAgent object "agent", so that after shifting the > save_library_signature to os_aix,we can still access the agent object.-> For > this i tried importing jvm/prims header file, but i get segmentation faults > during build . Not sure if i am doing it the right way. > > 2. Pass a character buffer(and not const char*) where we copy the > modified filename back to it and then use it in jvmAgent. code. Does not sound really appealing tbh. We pile one hack atop of another. Please synchronize with @JoKern65 at SAP. He will rewrite the JVMTI handler code, which will make this point moot. See https://bugs.openjdk.org/browse/JDK-8320890. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829776298
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Mon, 27 Nov 2023 15:44:58 GMT, suchismith1993 wrote: > > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if > > > > the condition fails for .so files, because i have to reload it again > > > > and check if the .a exists. In the shared code i had repeat less number > > > > of lines i believe. Do you suggest moving lines 1132 to 1139 to another > > > > function then ? > > > > > > > > > @tstuefe Any suggestion on this ? > > > > > > ``` > > --- a/src/hotspot/os/aix/os_aix.cpp > > +++ b/src/hotspot/os/aix/os_aix.cpp > > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, > > char* buf, > >return true; > > } > > > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) > > { > > > >log_info(os)("attempting shared library load of %s", filename); > > > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, > > int ebuflen) { > >return nullptr; > > } > > > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > + > > + void* result = nullptr; > > + > > + // First try using *.so suffix; failing that, retry with *.a suffix. > > + const size_t len = strlen(filename); > > + constexpr size_t safety = 3 + 1; > > + constexpr size_t bufsize = len + safety; > > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > > + strcpy(buf, filename); > > + char* const dot = strrchr(buf, '.'); > > + > > + assert(dot != nullptr, "Attempting to load a shared object without > > extension? %s", filename); > > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > > + "Attempting to load a shared object that is neither *.so nor *.a", > > filename); > > + > > + sprintf(dot, ".so"); > > + result = dll_load_inner(buf, ebuf, ebuflen); > > + > > + if (result == nullptr) { > > +sprintf(dot, ".a"); > > +result = dll_load_inner(buf, ebuf, ebuflen); > > + } > > + > > + FREE_C_HEAP_ARRAY(char, buf); > > + > > + return result; > > +} > > + > > ``` > > @tstuefe as discussed with @TheRealMDoerr do you think using default argument > will help ? Either we pass agent object as 3rd parameter or an empty > character buffer(and not const chat*) which would be spcifically used to copy > the alternate filename to it using strcpy so that it is reflected in the > jvmagent code ? A third parameter for what? - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829602040
Re: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5]
On Mon, 27 Nov 2023 13:23:42 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> adopt types > > This now causes problems with > > https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 > > since it removes the possibility of silently alternating the file path inside > os::dll_load, which would be the preferred way for AIX to handle *.a shared > objects. So this causes even more ifdef AIX to bloom up everywhere. > > A much better solution would have been to mimic stable-handle behavior inside > the AIX version of `os::dll_load`. > > Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple > mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) > tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp > without any changes to shared coding, and would have provided complete > coverage for all users of dll_load. > @tstuefe: Hi Thomas, I'm not sure if I got it. We can make (inode, devid) to > a hash, which replaces the dlhandle on return of os::dlload. This hash would > of course be the same if the same library is loaded twice. But I do not know > how to handle the two real dlhandles. Why do you need two dlhandles? A handle returned from dlopen should be valid for the whole process. If you cache that in a hashmap, and for the second caller of os::dll_load() return the cached variant, this should work, no? - PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1828019003
Re: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5]
On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , >> the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in >> [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try >> of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) >> was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of >> the root cause. >> The root cause is, that dlopen() on AIX returns different handles every >> time, even if you load a library twice. There is no official AIX API >> available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and >> st_inode. After a dlopen() the stat64x() API is called additionally to get >> this information which is then stored parallel to the library handle in the >> jvmtiAgent. For AIX we then can compare these values instead of the library >> handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > adopt types This now causes problems with https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 since it removes the possibility of silently alternating the file path inside os::dll_load, which would be the preferred way for AIX to handle *.a shared objects. So this causes even more ifdef AIX to bloom up everywhere. A much better solution would have been to mimic stable-handle behavior inside the AIX version of `os::dll_load`. Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp without any changes to shared coding, and would have provided complete coverage for all users of dll_load. - PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1827826924
Re: RFR: 8320699: Add parameter to skip progress logging of OutputAnalyzer
On Fri, 24 Nov 2023 10:34:02 GMT, Stefan Karlsson wrote: > Tests using ProcessTools.executeProcess gets the following output written to > stdout: > [2023-11-24T09:58:16.797540608Z] Gathering output for process 2517117 > [2023-11-24T09:58:23.070781345Z] Waiting for completion for process 2517117 > [2023-11-24T09:58:23.071045055Z] Waiting for completion finished for process > 2517117 > > This might be good for some use cases and debugging, but some tests spawns a > large number of processes and for those this output fills up the log files. > > I propose that we add a way to turn of this output for tests where we find > this output to be too noisy. I'm fine with this, but would prefer a global switch instead of hard-coding the value per test and handing it down. - PR Comment: https://git.openjdk.org/jdk/pull/16807#issuecomment-1825910975
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Fri, 24 Nov 2023 11:45:25 GMT, suchismith1993 wrote: > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the > > condition fails for .so files, because i have to reload it again and check > > if the .a exists. In the shared code i had repeat less number of lines i > > believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > @tstuefe Any suggestion on this ? --- a/src/hotspot/os/aix/os_aix.cpp +++ b/src/hotspot/os/aix/os_aix.cpp @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, return true; } -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { log_info(os)("attempting shared library load of %s", filename); @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { return nullptr; } +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { + + void* result = nullptr; + + // First try using *.so suffix; failing that, retry with *.a suffix. + const size_t len = strlen(filename); + constexpr size_t safety = 3 + 1; + constexpr size_t bufsize = len + safety; + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); + strcpy(buf, filename); + char* const dot = strrchr(buf, '.'); + + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, + "Attempting to load a shared object that is neither *.so nor *.a", filename); + + sprintf(dot, ".so"); + result = dll_load_inner(buf, ebuf, ebuflen); + + if (result == nullptr) { +sprintf(dot, ".a"); +result = dll_load_inner(buf, ebuf, ebuflen); + } + + FREE_C_HEAP_ARRAY(char, buf); + + return result; +} + - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1825721906
Re: RFR: 8320691: Timeout handler on Windows takes 2 hours to complete [v2]
On Fri, 24 Nov 2023 09:51:07 GMT, Daniel Jeliński wrote: >> test/failure_handler/src/share/conf/windows.properties line 60: >> >>> 58: >>> 59: native.core.app=cdb >>> 60: native.core.args=-c ".dump /ma core.%p;qd" -p %p >> >> Just to double check that this is the right option. `/ma` terminates if >> there is "inaccessable memory" where `/mA` seems to be continue. The `/f` >> option says all "accessible" memory which I assume it means skips/ignored >> inaccessible, is that right? > > `/ma` was the option suggested by `.dump`; looking at the documentation, > `/mA` indeed seems to be a better choice. I'll update. > > `/f` used to create a [full user-mode > dump](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/user-mode-dump-files#full-user-mode-dumps). > I couldn't find any information about the behavior on inaccessible memory. > None of the options specify dumping inaccessible memory; I assume the > failures only happen if some memory becomes inaccessible while producing the > dump. Oh, I overlooked this. I think that is right, the old /f minidump generation worked around inaccessible memory. They were just "holes". - PR Review Comment: https://git.openjdk.org/jdk/pull/16806#discussion_r1404160596
Re: RFR: 8320691: Timeout handler on Windows takes 2 hours to complete [v2]
On Fri, 24 Nov 2023 09:51:07 GMT, Daniel Jeliński wrote: >> test/failure_handler/src/share/conf/windows.properties line 60: >> >>> 58: >>> 59: native.core.app=cdb >>> 60: native.core.args=-c ".dump /ma core.%p;qd" -p %p >> >> Just to double check that this is the right option. `/ma` terminates if >> there is "inaccessable memory" where `/mA` seems to be continue. The `/f` >> option says all "accessible" memory which I assume it means skips/ignored >> inaccessible, is that right? > > `/ma` was the option suggested by `.dump`; looking at the documentation, > `/mA` indeed seems to be a better choice. I'll update. > > `/f` used to create a [full user-mode > dump](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/user-mode-dump-files#full-user-mode-dumps). > I couldn't find any information about the behavior on inaccessible memory. > None of the options specify dumping inaccessible memory; I assume the > failures only happen if some memory becomes inaccessible while producing the > dump. Oh, I overlooked this. I think that is right, the old /f minidump generation worked around inaccessible memory. They were just "holes". - PR Review Comment: https://git.openjdk.org/jdk/pull/16806#discussion_r1404160596