Re: RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default

2024-05-09 Thread Thomas Stuefe
On Thu, 9 May 2024 05:04:47 GMT, Thomas Stuefe  wrote:

>> On Linux aarch64, a JVM may encounter three different page sizes: 4K, 64K, 
>> and (when run on Mac M1 hardware) 16K.
>> 
>> Since forgetting to specify `--enable-compatible-cds-alignment` is a common 
>> error that is only noticed when running the produced JVM on hardware with 
>> different page size, we propose to enable that option by default on Linux 
>> aarch64. The cost is a moderate increase in CDS archive size (about 300K).
>> 
>> I tested this patch on:
>> - x64 Linux
>> - x64 Linux, crossbuilding to aarch64
>> - building natively on aarch64 Linux
>
> Does anyone else have an opinion on this matter? The question is:
> 
> "Should we enable cds-compatible alignment in the configure options by 
> default generally, not only for Linux aarch64" ?
> 
> This would, in addition to the proposed change affecting Linux on aarch64, 
> also affect MacOS x64. JVMs built there would be able to run, by default, in 
> Rosetta. At the price of sligthly increased archive size of a dozen KB or so.
> 
> If nobody objects, I will do that since I like the simplicity of it.

> @tstuefe We already have an issue to reset the default to true (raised by 
> @iklam)
> 
> https://bugs.openjdk.org/browse/JDK-8331952

Okay. I see it was created yesterday evening. @iklam how do you want to handle 
this? Do you want to proceed with 8331952 and I close this as duplicate? Or 
vice versa?

-

PR Comment: https://git.openjdk.org/jdk/pull/19142#issuecomment-2102339405


Re: RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default

2024-05-08 Thread Thomas Stuefe
On Wed, 8 May 2024 15:14:16 GMT, Thomas Stuefe  wrote:

> On Linux aarch64, a JVM may encounter three different page sizes: 4K, 64K, 
> and (when run on Mac M1 hardware) 16K.
> 
> Since forgetting to specify `--enable-compatible-cds-alignment` is a common 
> error that is only noticed when running the produced JVM on hardware with 
> different page size, we propose to enable that option by default on Linux 
> aarch64. The cost is a moderate increase in CDS archive size (about 300K).
> 
> I tested this patch on:
> - x64 Linux
> - x64 Linux, crossbuilding to aarch64
> - building natively on aarch64 Linux

Does anyone else have an opinion on this matter? The question is:

"Should we enable cds-compatible alignment in the configure options by default 
generally, not only for Linux aarch64" ?

This would, in addition to the proposed change affecting Linux on aarch64, also 
affect MacOS x64. JVMs built there would be able to run, by default, in 
Rosetta. At the price of sligthly increased archive size of a dozen KB or so.

If nobody objects, I will do that since I like the simplicity of it.

-

PR Comment: https://git.openjdk.org/jdk/pull/19142#issuecomment-2101941697


Re: RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default

2024-05-08 Thread Thomas Stuefe
On Wed, 8 May 2024 17:01:26 GMT, Andrew Haley  wrote:

> This is obviously correct. Otherwise, the CDS archive will only work if the 
> JDK happens to have been built on a machine with greater or equal page size.
> 
> Does the default `compatible-cds-alignment=false` make any sense anywhere on 
> other platforms, though? We might simply make compatibility the default.

The configure option is also evaluated for MacOS x64 to enforce an alignment of 
16K, to be able to run on MacOS aarch64 in Rosetta. Seeing how MacOS x64 is not 
really important anymore, and losses due to 16K alignment are small, I'd be 
fine with setting the default to compatible. It would be a simpler patch, 
certainly.

-

PR Comment: https://git.openjdk.org/jdk/pull/19142#issuecomment-2101138971


RFR: 8331942: On Linux aarch64, CDS archives should be using 64K alignment by default

2024-05-08 Thread Thomas Stuefe
On Linux aarch64, a JVM may encounter three different page sizes: 4K, 64K, and 
(when run on Mac M1 hardware) 16K.

Since forgetting to specify `--enable-compatible-cds-alignment` is a common 
error that is only noticed when running the produced JVM on hardware with 
different page size, we propose to enable that option by default on Linux 
aarch64. The cost is a moderate increase in CDS archive size (about 300K).

I tested this patch on:
- x64 Linux
- x64 Linux, crossbuilding to aarch64
- building natively on aarch64 Linux

-

Commit messages:
 - aarch64-should-default-to-64k-alignment-for-cds-regions

Changes: https://git.openjdk.org/jdk/pull/19142/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19142=00
  Issue: https://bugs.openjdk.org/browse/JDK-8331942
  Stats: 12 lines in 1 file changed: 11 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19142.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19142/head:pull/19142

PR: https://git.openjdk.org/jdk/pull/19142


Re: RFR: 8329581: Java launcher no longer prints a stack trace [v9]

2024-05-08 Thread Thomas Stuefe
On Mon, 6 May 2024 19:06:10 GMT, Sonia Zaldana Calles  
wrote:

>> Hi folks, 
>> 
>> This PR aims to fix 
>> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
>> 
>> I think the regression got introduced in 
>> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
>> 
>> In the issue linked above, 
>> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>>  got removed to simplify launcher code.
>> 
>> Previously, we used ```getMainType``` to do the appropriate main method 
>> invocation in ```JavaMain```. However, we currently attempt to do all types 
>> of main method invocations at the same time 
>> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>>  
>> 
>> Note how all of these invocations clear the exception reported with 
>> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>>  
>> 
>> Therefore, if a legitimate exception comes up during one of these 
>> invocations, it does not get reported. 
>> 
>> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
>> forward to your suggestions. 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fixing indentation
>   
>   Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com>

Pre-existing: Man, I cannot grok the complex return code handling, tbh.

We have the local `ret` variable holding a return code. We also hand codes to 
CHECK_EXCEPTION_LEAVE as macro argument. But we don't hand codes to 
CHECK_EXCEPTION_NULL_LEAVE. LEAVE uses the locally defined `ret` instead of 
getting the return code as argument. CHECK_EXCEPTION_LEAVE modifies the local 
`ret`, but CHECK_EXCEPTION_NULL_LEAVE does not.

CHECK_EXCEPTION_NULL_LEAVE does not set `ret`. So, in case of an error, it 
would cause the launcher to return OK, but this does not happen because the 
local `ret` gets initialized to 1 before the first call to 
CHECK_EXCEPTION_NULL_LEAVE (line 566 resp. 560). Not sure if this was 
intentional, but it surely is very brittle. We rely on the content of `ret`, 
and that changes several times throughout JavaMain.

CHECK_EXCEPTION_NULL_LEAVE argument is named CENL_exception, which I don't 
understand.

To confuse matters more, the logic for internal error codes and the launcher 
return code is reversed: internally, 0 means error, and externally, 0 means 
success. And we only use numerical literals (`1`, `0`) instead of clearly named 
constants.

This may be food for another RFE, to keep this patch minimal. But a good 
solution, to me, would be like this:

- have the same logic for return codes (1 = error, 0 = success) to ease 
understanding
- have clearly named constants (e.g. "LAUNCHER_OK" 0, "LAUNCHER_ERR" = 1)
- have the LEAVE macro take the launcher return code as argument
- have all xxx_LEAVE macros pass in LAUNCHER_ERR to LEAVE
- call the final LEAVE with LAUNCHER_OK
- optionally, define something like "LEAVE_ERR" and "LEAVE_OK" that call LEAVE 
with either LAUNCHER_ERR or LAUNCHER_OK, for more concise coding.

For this patch, I think the return code logic is okay, but I would feel better 
if others double-checked.

src/java.base/share/native/libjli/java.c line 394:

> 392: if ((*env)->ExceptionOccurred(env)) { \
> 393: return 0; \
> 394: } else if (obj == NULL) { \

Side note, I first wondered if this comparison is strictly correct, since we 
now pass in `jmethodID` as well as `jobject`, which are opaque types and not 
necessarily of the same size.

But seems that jmethodID==NULL is defined to mean "invalid" [1] by the spec. 
Requiring NULL instead of providing an opaque invalid constant feels like an 
odd choice in the original JNI spec, since it requires implementors to use a 
pointer type to implement jmethodID? Which we do, in OpenJDK [2].

[1] 
https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#getstaticmethodid
[2] 
https://github.com/openjdk/jdk/blob/2baacfc16916220846743c6e49a99a6c41cac510/src/java.base/share/native/include/jni.h#L135-L136

src/java.base/share/native/libjli/java.c line 420:

> 418: jmethodID mainID =
> 419: (*env)->GetMethodID(env, mainClass, "main", 
> "([Ljava/lang/String;)V");
> 420: CHECK_EXCEPTION_NULL_FAIL(mainID);

Is there a particular reason why you moved this section up here, from line 432 
before? If not, I'd restore it to its original position to keep the diff small.

src/java.base/share/native/libjli/java.c line 452:

> 450: jobject mainObject = (*env)->NewObject(env, mainClass, constructor);
> 451: CHECK_EXCEPTION_NULL_FAIL(mainObject);
> 452: jmethodID mainID = (*env)->GetMethodID(env, mainClass, "main", 
> "()V");

Unnecessary change. Please restore 

Re: RFR: 8329862: GetApplicationHome and GetApplicationHomeFromDll enhance jli tracing [v6]

2024-04-26 Thread Thomas Stuefe
On Thu, 25 Apr 2024 13:22:01 GMT, Matthias Baesken  wrote:

>> We have already good JLI tracing capabilities. But GetApplicationHome and 
>> GetApplicationHomeFromDll lack some tracing and should be enhanced.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   remove /jre path check

Change still good

> Still looks good. We should maybe change the synopsis (title) of the bug to 
> something like "libjli GetApplicationHome cleanups"?

Well, it does enhance tracing, so the title is not lying :)

-

PR Comment: https://git.openjdk.org/jdk/pull/18699#issuecomment-2078751493


Re: RFR: 8329862: GetApplicationHome and GetApplicationHomeFromDll enhance jli tracing [v5]

2024-04-25 Thread Thomas Stuefe
On Tue, 23 Apr 2024 14:31:44 GMT, Matthias Baesken  wrote:

>> We have already good JLI tracing capabilities. But GetApplicationHome and 
>> GetApplicationHomeFromDll lack some tracing and should be enhanced.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   adjust output

LGTM

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18699#pullrequestreview-2022399268


Re: RFR: 8322043: HeapDumper should use parallel dump by default [v3]

2024-04-17 Thread Thomas Stuefe
On Wed, 17 Apr 2024 07:22:55 GMT, David Holmes  wrote:

>> I think it makes the code more flexible - it allows to distinguish between 
>> "use default value" and "I don't care" cases.
>
> I'm not sure it is a worthwhile distinction. Not passing an actual parameter 
> means "I don't care - take the default".

I agree with @dholmes-ora. Let's keep things simple.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18748#discussion_r1568833440


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]

2024-04-16 Thread Thomas Stuefe
On Sat, 13 Apr 2024 18:29:59 GMT, Thomas Stuefe  wrote:

>> Severin Gehwolf has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains ten additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>>  - jcheck fixes
>>  - Fix tests
>>  - Implement Metrics.isContainerized()
>>  - Some clean-up
>>  - Drop cgroups testing on plain Linux
>>  - Implement fall-back logic for non-ro controller mounts
>>  - Make find_ro static and local to compilation unit
>>  - 8261242: [Linux] OSContainer::is_containerized() returns true
>
> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170:
> 
>> 168: }
>> 169:   }
>> 170:   return false;
> 
> An alternative, simpler, no need for modifying source string:
> 
> static bool find_ro_opt(const char* o) {
>   return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,");
> }

Please disregard my comment. Albeit longer, your version is clearer to read and 
more fault tolerant.

> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351:
> 
>> 349: //
>> 350: // We collect the read only mount option in the cgroup infos so as 
>> to have that
>> 351: // info ready when determining is_containerized().
> 
> Here, and in other places: a comment indicating the line format we scan would 
> be appreciated, possibly with argument numbers. Saves the casual code reader 
> from looking into proc man page. Even just pasting the example line for proc 
> manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) 
> (but with order adapted to your scanf call, they count major:minor as one)

Trying to parse the `%s%*[^-]-`

So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- 
parses everything that is not a dash, until we encounter the dash? Then we eat 
the dash? This is to skip the optionals?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567754861
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567767209


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]

2024-04-16 Thread Thomas Stuefe
On Thu, 11 Apr 2024 12:08:02 GMT, Severin Gehwolf  wrote:

>> Please review this enhancement to the container detection code which allows 
>> it to figure out whether the JVM is actually running inside a container 
>> (`podman`, `docker`, `crio`), or with some other means that enforces 
>> memory/cpu limits by means of the cgroup filesystem. If neither of those 
>> conditions hold, the JVM runs in not containerized mode, addressing the 
>> issue described in the JBS tracker. For example, on my Linux system 
>> `is_containerized() == false" is being indicated with the following trace 
>> log line:
>> 
>> 
>> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false 
>> because no cpu or memory limit is present
>> 
>> 
>> This state is being exposed by the Java `Metrics` API class using the new 
>> (still JDK internal) `isContainerized()` method. Example:
>> 
>> 
>> java -XshowSettings:system --version
>> Operating System Metrics:
>> Provider: cgroupv1
>> System not containerized.
>> openjdk 23-internal 2024-09-17
>> OpenJDK Runtime Environment (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
>> 
>> 
>> The basic property this is being built on is the observation that the cgroup 
>> controllers typically get mounted read only into containers. Note that the 
>> current container tests assert that `OSContainer::is_containerized() == 
>> true` in various tests. Therefore, using the heuristic of "is any memory or 
>> cpu limit present" isn't sufficient. I had considered that in an earlier 
>> iteration, but many container tests failed.
>> 
>> Overall, I think, with this patch we improve the current situation of 
>> claiming a containerized system being present when it's actually just a 
>> regular Linux system.
>> 
>> Testing:
>> 
>> - [x] GHA (risc-v failure seems infra related)
>> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 
>> (including gtests)
>> - [x] Some manual testing using cri-o
>> 
>> Thoughts?
>
> Severin Gehwolf has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains ten additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - jcheck fixes
>  - Fix tests
>  - Implement Metrics.isContainerized()
>  - Some clean-up
>  - Drop cgroups testing on plain Linux
>  - Implement fall-back logic for non-ro controller mounts
>  - Make find_ro static and local to compilation unit
>  - 8261242: [Linux] OSContainer::is_containerized() returns true

I am not enough of a container expert to judge if the basic approach is right - 
I trust you on this. This is just a technical code review.

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170:

> 168: }
> 169:   }
> 170:   return false;

An alternative, simpler, no need for modifying source string:

static bool find_ro_opt(const char* o) {
  return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,");
}

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351:

> 349: //
> 350: // We collect the read only mount option in the cgroup infos so as 
> to have that
> 351: // info ready when determining is_containerized().

Here, and in other places: a comment indicating the line format we scan would 
be appreciated, possibly with argument numbers. Saves the casual code reader 
from looking into proc man page. Even just pasting the example line for proc 
manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but 
with order adapted to your scanf call, they count major:minor as one)

src/hotspot/os/linux/osContainer_linux.cpp line 78:

> 76:   const char *reason;
> 77:   bool any_mem_cpu_limit_present = false;
> 78:   bool ctrl_ro = cgroup_subsystem->is_containerized();

nit: naming? what does ctrl mean in this case? Maybe use 
"cgroup_is_containerized"?

src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375:

> 373: if (!c.isContainerized()) {
> 374: ostream.println(INDENT + "System not containerized.");
> 375: return;

Why return here? Would this not cut the output short in the non-containerized 
case?

And if this not intended, the not-containerized-`-XshowSettings:system` test 
below should test and catch this (e.g. scan for CPU set)

-

PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-1999328503
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1564182879
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567756663
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567774124
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567779248


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]

2024-04-16 Thread Thomas Stuefe
On Sat, 13 Apr 2024 18:29:59 GMT, Thomas Stuefe  wrote:

>> Severin Gehwolf has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains ten additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>>  - jcheck fixes
>>  - Fix tests
>>  - Implement Metrics.isContainerized()
>>  - Some clean-up
>>  - Drop cgroups testing on plain Linux
>>  - Implement fall-back logic for non-ro controller mounts
>>  - Make find_ro static and local to compilation unit
>>  - 8261242: [Linux] OSContainer::is_containerized() returns true
>
> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170:
> 
>> 168: }
>> 169:   }
>> 170:   return false;
> 
> An alternative, simpler, no need for modifying source string:
> 
> static bool find_ro_opt(const char* o) {
>   return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,");
> }

Please disregard my comment. Albeit longer, your version is clearer to read and 
more fault tolerant.

> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351:
> 
>> 349: //
>> 350: // We collect the read only mount option in the cgroup infos so as 
>> to have that
>> 351: // info ready when determining is_containerized().
> 
> Here, and in other places: a comment indicating the line format we scan would 
> be appreciated, possibly with argument numbers. Saves the casual code reader 
> from looking into proc man page. Even just pasting the example line for proc 
> manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) 
> (but with order adapted to your scanf call, they count major:minor as one)

Trying to parse the `%s%*[^-]-`

So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- 
parses everything that is not a dash, until we encounter the dash? Then we eat 
the dash? This is to skip the optionals?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567754861
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567767209


Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]

2024-04-16 Thread Thomas Stuefe
On Thu, 11 Apr 2024 12:08:02 GMT, Severin Gehwolf  wrote:

>> Please review this enhancement to the container detection code which allows 
>> it to figure out whether the JVM is actually running inside a container 
>> (`podman`, `docker`, `crio`), or with some other means that enforces 
>> memory/cpu limits by means of the cgroup filesystem. If neither of those 
>> conditions hold, the JVM runs in not containerized mode, addressing the 
>> issue described in the JBS tracker. For example, on my Linux system 
>> `is_containerized() == false" is being indicated with the following trace 
>> log line:
>> 
>> 
>> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false 
>> because no cpu or memory limit is present
>> 
>> 
>> This state is being exposed by the Java `Metrics` API class using the new 
>> (still JDK internal) `isContainerized()` method. Example:
>> 
>> 
>> java -XshowSettings:system --version
>> Operating System Metrics:
>> Provider: cgroupv1
>> System not containerized.
>> openjdk 23-internal 2024-09-17
>> OpenJDK Runtime Environment (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 
>> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
>> 
>> 
>> The basic property this is being built on is the observation that the cgroup 
>> controllers typically get mounted read only into containers. Note that the 
>> current container tests assert that `OSContainer::is_containerized() == 
>> true` in various tests. Therefore, using the heuristic of "is any memory or 
>> cpu limit present" isn't sufficient. I had considered that in an earlier 
>> iteration, but many container tests failed.
>> 
>> Overall, I think, with this patch we improve the current situation of 
>> claiming a containerized system being present when it's actually just a 
>> regular Linux system.
>> 
>> Testing:
>> 
>> - [x] GHA (risc-v failure seems infra related)
>> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 
>> (including gtests)
>> - [x] Some manual testing using cri-o
>> 
>> Thoughts?
>
> Severin Gehwolf has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains ten additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - jcheck fixes
>  - Fix tests
>  - Implement Metrics.isContainerized()
>  - Some clean-up
>  - Drop cgroups testing on plain Linux
>  - Implement fall-back logic for non-ro controller mounts
>  - Make find_ro static and local to compilation unit
>  - 8261242: [Linux] OSContainer::is_containerized() returns true

I am not enough of a container expert to judge if the basic approach is right - 
I trust you on this. This is just a technical code review.

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170:

> 168: }
> 169:   }
> 170:   return false;

An alternative, simpler, no need for modifying source string:

static bool find_ro_opt(const char* o) {
  return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,");
}

src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351:

> 349: //
> 350: // We collect the read only mount option in the cgroup infos so as 
> to have that
> 351: // info ready when determining is_containerized().

Here, and in other places: a comment indicating the line format we scan would 
be appreciated, possibly with argument numbers. Saves the casual code reader 
from looking into proc man page. Even just pasting the example line for proc 
manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but 
with order adapted to your scanf call, they count major:minor as one)

src/hotspot/os/linux/osContainer_linux.cpp line 78:

> 76:   const char *reason;
> 77:   bool any_mem_cpu_limit_present = false;
> 78:   bool ctrl_ro = cgroup_subsystem->is_containerized();

nit: naming? what does ctrl mean in this case? Maybe use 
"cgroup_is_containerized"?

src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375:

> 373: if (!c.isContainerized()) {
> 374: ostream.println(INDENT + "System not containerized.");
> 375: return;

Why return here? Would this not cut the output short in the non-containerized 
case?

And if this not intended, the not-containerized-`-XshowSettings:system` test 
below should test and catch this (e.g. scan for CPU set)

-

PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-1999328503
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1564182879
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567756663
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567774124
PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567779248


Re: RFR: 8329581: Java launcher no longer prints a stack trace

2024-04-16 Thread Thomas Stuefe
On Tue, 16 Apr 2024 07:55:26 GMT, Thomas Stuefe  wrote:

>> Hi folks, 
>> 
>> This PR aims to fix 
>> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
>> 
>> I think the regression got introduced in 
>> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
>> 
>> In the issue linked above, 
>> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>>  got removed to simplify launcher code.
>> 
>> Previously, we used ```getMainType``` to do the appropriate main method 
>> invocation in ```JavaMain```. However, we currently attempt to do all types 
>> of main method invocations at the same time 
>> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>>  
>> 
>> Note how all of these invocations clear the exception reported with 
>> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>>  
>> 
>> Therefore, if a legitimate exception comes up during one of these 
>> invocations, it does not get reported. 
>> 
>> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
>> forward to your suggestions. 
>> 
>> Cheers, 
>> Sonia
>
> src/java.base/share/classes/sun/launcher/LauncherHelper.java line 912:
> 
>> 910: private static final int MAIN_WITHOUT_ARGS = 1;
>> 911: private static final int MAIN_NONSTATIC = 2;
>> 912: private static int mainType = 0;
> 
> Nit: transferring information from java to C in this way is usually done, 
> AFAICS, by accessing static fields directly 
> (GetStaticFieldID+GetStaticXXXField). There is also a helper function that 
> wraps this, see JNU_GetStaticFieldByName. Not sure if you have that available 
> in libjli though. There are many examples for both patterns.
> 
> I also would consider just declaring two booleans here, isStatic and hasArgs, 
> which that would be a bit clearer to read instead of combining both into a 
> single flag variable, and no need to keep flag values synced across java/C.

Thinking about this some more, would it not be possible to just use the 
mainMethod directly down in C?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18786#discussion_r1566948449


Re: RFR: 8329581: Java launcher no longer prints a stack trace

2024-04-16 Thread Thomas Stuefe
On Mon, 15 Apr 2024 18:25:02 GMT, Sonia Zaldana Calles  
wrote:

> Hi folks, 
> 
> This PR aims to fix 
> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
> 
> I think the regression got introduced in 
> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
> 
> In the issue linked above, 
> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>  got removed to simplify launcher code.
> 
> Previously, we used ```getMainType``` to do the appropriate main method 
> invocation in ```JavaMain```. However, we currently attempt to do all types 
> of main method invocations at the same time 
> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>  
> 
> Note how all of these invocations clear the exception reported with 
> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>  
> 
> Therefore, if a legitimate exception comes up during one of these 
> invocations, it does not get reported. 
> 
> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
> forward to your suggestions. 
> 
> Cheers, 
> Sonia

> Isn't this already fixed by #18753?

> Yes, it would be good to try the changes in pull/18753 first.

I still like @SoniaZaldana 's variant better.

AFAIU, in the original variant, we choose the main method via 
MethodFinder.findMainMethod, but then down in JavaMain in C we just try to 
invoke all variants in succession 
(static,args->instance,args->static,noargs->instance,noargs). Does that no mean 
we ignore the decision of findMainMethod?

Apart from that, it means fewer exceptions are thrown when starting the JVM if 
we avoid calling methods we know don't work.

-

PR Comment: https://git.openjdk.org/jdk/pull/18786#issuecomment-2058521084


Re: RFR: 8329581: Java launcher no longer prints a stack trace

2024-04-16 Thread Thomas Stuefe
On Mon, 15 Apr 2024 18:25:02 GMT, Sonia Zaldana Calles  
wrote:

> Hi folks, 
> 
> This PR aims to fix 
> [JDK-8329581](https://bugs.openjdk.org/browse/JDK-8329581). 
> 
> I think the regression got introduced in 
> [JDK-8315458](https://bugs.openjdk.org/browse/JDK-8315458). 
> 
> In the issue linked above, 
> [LauncherHelper#getMainType](https://github.com/openjdk/jdk/pull/16461/files#diff-108a3a3e3c2d108c8c7f19ea498f641413b7c9239ecd2975a6c27d904c2ba226)
>  got removed to simplify launcher code.
> 
> Previously, we used ```getMainType``` to do the appropriate main method 
> invocation in ```JavaMain```. However, we currently attempt to do all types 
> of main method invocations at the same time 
> [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjli/java.c#L623).
>  
> 
> Note how all of these invocations clear the exception reported with 
> [CHECK_EXCEPTION_FAIL](https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/src/java.base/share/native/libjli/java.c#L390).
>  
> 
> Therefore, if a legitimate exception comes up during one of these 
> invocations, it does not get reported. 
> 
> I propose reintroducing ```LauncherHelper#getMainType``` but I'm looking 
> forward to your suggestions. 
> 
> Cheers, 
> Sonia

Good find.

src/java.base/share/classes/sun/launcher/LauncherHelper.java line 912:

> 910: private static final int MAIN_WITHOUT_ARGS = 1;
> 911: private static final int MAIN_NONSTATIC = 2;
> 912: private static int mainType = 0;

Nit: transferring information from java to C in this way is usually done, 
AFAICS, by accessing static fields directly 
(GetStaticFieldID+GetStaticXXXField). There is also a helper function that 
wraps this, see JNU_GetStaticFieldByName. Not sure if you have that available 
in libjli though. There are many examples for both patterns.

I also would consider just declaring two booleans here, isStatic and hasArgs, 
which that would be a bit clearer to read instead of combining both into a 
single flag variable, and no need to keep flag values synced across java/C.

-

PR Review: https://git.openjdk.org/jdk/pull/18786#pullrequestreview-2002891641
PR Review Comment: https://git.openjdk.org/jdk/pull/18786#discussion_r1566896963


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-13 Thread Thomas Stuefe
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

I have one question, and I'm sorry if it has been answered before. How 
expensive is changing the mode? Is it just a status variable in user-space 
pthread lib? Or does it need a system call? 

In other words, how fine granular can we get without incurring too high a cost?

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2053721713


Re: RFR: 8322043: HeapDumper should use parallel dump by default

2024-04-13 Thread Thomas Stuefe
On Fri, 12 Apr 2024 18:50:37 GMT, Alex Menkov  wrote:

> > I am curious: what is the memory overhead for parallel mode, and (I am not 
> > familiar with the logic) how many threads are involved? Is the number of 
> > thread bounded?
> > I ask because, especially for the OnOOM handling, we may already be at a 
> > limit memory-wise. Starting to swap will probably be worse than running 
> > single-threaded.
> 
> Good question. It think it's several MB per each additional thread (1MB 
> output buffer, DumperClassCacheTable - 1031 elements max, element size 
> depends on class field numbers, if HeapDumpGzipLevel is set, some buffers for 
> gzip compressors) Number of threads by default is min of 
> `os::initial_active_processor_count() * 3 / 8` and number of GC workers.

For the OOM case, I would probably make it somehow dependent on 
os::free_memory() then.

-

PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2053517286


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

2024-04-12 Thread Thomas Stuefe
On Fri, 12 Apr 2024 10:07:48 GMT, Claes Redestad  wrote:

> I guess Lilliput adds some hard or soft limit on the number of classes loaded?

Yes, we are concerned with that, especially for a possible future where 
Lilliput is the sole default. Atm we can address about 4 million classes. There 
are thoughts about making this number of classes infinite, but if possible we 
would like to avoid that complexity.

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051597166


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

2024-04-12 Thread Thomas Stuefe
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad  wrote:

> This patch suggests a workaround to an issue with huge SCF MH expression 
> trees taking excessive JIT compilation resources by reviving (part of) the 
> simple bytecode-generating strategy that was originally available as an 
> all-or-nothing strategy choice. 
> 
> Instead of reintroducing a binary strategy choice I propose a threshold 
> parameter, controlled by 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions 
> below or at this threshold there's no change, for expressions with an arity 
> above it we use the `StringBuilder`-chain bytecode generator. 
> 
> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low. It may also produce 
> worse performance on average. On the other hand there is the observed 
> increase in C2 resource usage as expressions grow unwieldy. On the other 
> other hand high arity expressions are likely rare to begin with, with less 
> opportunities for sharing than the more common low-arity expressions. 
> 
> I turned the submitted test case into a few JMH benchmarks and did some 
> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
> 
> Baseline strategy:
> 13 args: 6.3M
> 23 args: 18M
> 123 args: 868M
> 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
> 13 args: 2.11M
> 23 args: 3.67M
> 123 args: 4.75M
> 
> For 123 args the memory overhead of the baseline strategy is 180x, but for 23 
> args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
> at the vast majority of call sites.
> 
> I was asked to use the new class file API for mainline. There's a version of 
> this patch implemented using ASM in 7c52a9f which might be a reasonable basis 
> for a backport.

Fyi: 
https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051184229


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases

2024-04-12 Thread Thomas Stuefe
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad  wrote:

> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low

Stupid question maybe, this would be one LambdaForm per argument set? E.g. 
would two invocations with the same arguments re-use the same LambdaForm?

I would like to get an understanding of the numbers of classes involved for 
these solutions, and whether they are bounded. Mostly for Lilliput reason.

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051082454


Re: RFR: 8322043: HeapDumper should use parallel dump by default

2024-04-12 Thread Thomas Stuefe
On Fri, 12 Apr 2024 02:17:34 GMT, Alex Menkov  wrote:

> The fix makes VM heap dumping parallel by default.
> `jcmd GC.heap_dump` and `jmap -dump` had parallel dumping by default, the fix 
> affects `HotSpotDiagnosticMXBean.dumpHeap()`, `-XX:+HeapDumpBeforeFullGC`, 
> `-XX:+HeapDumpAfterFullGC` and `-XX:+HeapDumpOnOutOfMemoryError`.
> 
> Testing:
>   - manually tested different heap dump scenarios with `-Xlog:heapdump`;
>   - tier1,tier2,hs-tier5-svc;
>   - all reg.tests that use heap dump.

I am curious: what is the memory overhead for parallel mode, and (I am not 
familiar with the logic) how many threads are involved? Is the number of thread 
bounded?

I ask because, especially for the OnOOM handling, we may already be at a limit 
memory-wise. Starting to swap will probably be worse than running 
single-threaded.

-

PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2051046673


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]

2024-04-10 Thread Thomas Stuefe
On Wed, 10 Apr 2024 13:46:11 GMT, Martin Doerr  wrote:

>> In my humble opinion the inclusion of alloca.h was slightly cleaner, but I 
>> guess it doesn't matter. Out of curiosity, why do you guys prefer not 
>> including it?
>
> When only looking at AIX code, I think the inclusion of alloca.h was cleaner. 
> Agreed. The new code makes AIX behave like other platforms and avoids the AIX 
> specific part in shared code.
> I could live with either version.

I can live with either, too.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1559536724


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v7]

2024-04-10 Thread Thomas Stuefe
On Wed, 10 Apr 2024 12:15:34 GMT, Joachim Kern  wrote:

>> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building 
>> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect 
>> clang by another name, and it uses the clang toolchain in the JDK build. 
>> Thus the old xlc toolchain was removed by 
>> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701).
>> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the 
>> last xlc rudiment.
>> This means merging the AIX specific content of 
>> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp 
>> into the corresponding gcc files on the on side and removing the 
>> defined(TARGET_COMPILER_xlc) blocks in the code, because the 
>> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX 
>> compiler.
>> The rest of the changes are needed because of using 
>> utilities/compilerWarnings_gcc.hpp the compiler is much more nagging about 
>> ill formatted printf
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   saver solution

This looks good to me now, provided Martin likes it too. Thanks for 
incorporating my suggestions.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18536#pullrequestreview-1991847641


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]

2024-04-09 Thread Thomas Stuefe
On Tue, 2 Apr 2024 09:19:16 GMT, Joachim Kern  wrote:

>> Hi Thomas,
>> I would like to get totally rid of this, because as I mentioned IBM already 
>> modified the `stdlib.h` header not using `#define malloc vec_malloc` any 
>> more (and all the other vec_... defines). We have to ask the adoptium 
>> colleagues at IBM if they already have raised their build environment by the 
>> 2 SP levels needed.
>> In principle we had to do the same workaround for `calloc, free,...` too, 
>> but they didn't show up as errors in the logging files.
>> These lines where never meant to stay for long. Just to be able to compile 
>> until IBM fixes the issue, which is done now.
>
> @suchismith1993
> Hi Suchi, can you please tell me when you will raise your build environment 
> from AIX 7.2 TL5 SP5 to SP7?
> I' am asking you, because I want to get rid of this nasty workaround.

Pinging @sxa - what build environment does temurin use for AIX?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558020493


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]

2024-04-09 Thread Thomas Stuefe
On Tue, 2 Apr 2024 16:14:12 GMT, Joachim Kern  wrote:

>> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building 
>> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect 
>> clang by another name, and it uses the clang toolchain in the JDK build. 
>> Thus the old xlc toolchain was removed by 
>> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701).
>> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the 
>> last xlc rudiment.
>> This means merging the AIX specific content of 
>> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp 
>> into the corresponding gcc files on the on side and removing the 
>> defined(TARGET_COMPILER_xlc) blocks in the code, because the 
>> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX 
>> compiler.
>> The rest of the changes are needed because of using 
>> utilities/compilerWarnings_gcc.hpp the compiler is much more nagging about 
>> ill formatted printf
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   version check not needed anymore

src/hotspot/os_cpu/aix_ppc/os_aix_ppc.cpp line 440:

> 438:   st->print("pc =" INTPTR_FORMAT "  ", (unsigned 
> long)uc->uc_mcontext.jmp_context.iar);
> 439:   st->print("lr =" INTPTR_FORMAT "  ", (unsigned 
> long)uc->uc_mcontext.jmp_context.lr);
> 440:   st->print("ctr=" INTPTR_FORMAT "  ", (unsigned 
> long)uc->uc_mcontext.jmp_context.ctr);

p2i

src/hotspot/os_cpu/aix_ppc/os_aix_ppc.cpp line 443:

> 441:   st->cr();
> 442:   for (int i = 0; i < 32; i++) {
> 443: st->print("r%-2d=" INTPTR_FORMAT "  ", i, (unsigned 
> long)uc->uc_mcontext.jmp_context.gpr[i]);

p2i

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558017408
PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558017827


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3]

2024-04-09 Thread Thomas Stuefe
On Tue, 2 Apr 2024 10:26:42 GMT, Joachim Kern  wrote:

>> src/hotspot/os/aix/os_aix.cpp line 314:
>> 
>>> 312:   ErrnoPreserver ep;
>>> 313:   log_trace(os, map)("disclaim failed: " RANGEFMT " errno=(%s)",
>>> 314:  RANGEFMTARGS(p, (long)maxDisclaimSize),
>> 
>> Wait, why are these casts needed? maxDisclaimSize is size_t, RANGEFMT uses 
>> SIZE_FORMAT. That should work without cast.
>
> Hi Thomas, `maxDisclaimSize` is of type `unsigned int`; therefore I get the 
> following warning:
> 
> os/aix/os_aix.cpp:314:42: error: format specifies type 'unsigned long' but 
> the argument has type 'unsigned int' [-Werror,-Wformat]
>  RANGEFMTARGS(p, maxDisclaimSize),
>  ^~~
> 
> Should I keep the casts, or change the type of `maxDisclaimSize, 
> numFullDisclaimsNeeded, lastDisclaimSize` to `const unsigned long`?

I would change them to size_t. Thanks for doing this.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1558012122


Re: RFR: 8318026: jcmd should provide access to detailed JVM object information [v9]

2024-04-08 Thread Thomas Stuefe
On Thu, 28 Mar 2024 17:15:26 GMT, Kevin Walls  wrote:

>>> In my opinion, UnlockDiagnosticVMOptions is not a good enough safeguard 
>>> since it guards a whole swathe of switches that we may instruct the 
>>> customer to enable. Once enabled, my experience is that 
>>> UnlockDiagnosticVMOptions often lingers around. It is not unusual for 
>>> customer scenarios to have set +UnlockDiagnosticVMOptions because of some 
>>> years ago support cases.
>> 
>> I think we also need to consider the flip side of this argument. Is this 
>> something that some customers might want to always enable, but don't want to 
>> always have UnlockDiagnosticVMOptions enabled. A new command line flag would 
>> be needed in that case.
>> 
>> Also, isn't UnlockDiagnosticVMOptions meant for enabling the use of 
>> diagnostic command line flags? Do we have examples of it enabling a hotspot 
>> feature that does not also require setting a diagnostic command line flag?
>
>> I think we also need to consider the flip side of this argument. Is this 
>> something that some customers might want to always enable, but don't want to 
>> always have UnlockDiagnosticVMOptions enabled. A new command line flag would 
>> be needed in that case.
>> 
>> Also, isn't UnlockDiagnosticVMOptions meant for enabling the use of 
>> diagnostic command line flags? Do we have examples of it enabling a hotspot 
>> feature that does not also require setting a diagnostic command line flag?
> 
> Thanks Chris - that is correct now I check the wording, 
> UnlockDiagnosticVMOptions: "Enable normal processing of flags relating to 
> field diagnostics"
> 
> Yes it makes flags which are DIAGNOSTIC available at the command-line.  We 
> have UnlockExperimentalVMOptions also.
> 
> My original reason for the guard, is that the risk of fooling  the "good oop" 
> check is that we could possibly crash (so that's answering your other 
> question).  Inspecting an arbitrary pointer that looks enough like a Java 
> heap object to fool the test, means it might try and resolve bad addresses 
> for Klass pointer or field data.
> 
> I also have a stress test which examines every address in a Java heap to test 
> for crashes.  That's obviously long-running as it does a jcmd for every 
> iteration.  But I can't currently get a crash, trying G1, ZGC, 
> ZGCGenerational.
> 
> Having a global guard to prevent usage of VM.inspect may be needed less than 
> I thought, but will wait on the other thread before saying that is really the 
> case.  We could have a new -XX flag to guard it (whether specific to this 
> command or a more general UnlockDiagnosticFeatures.  I would definitely stick 
> to this being in all builds, as we frequently "debug" non-debug builds.
> 
> Also, for your comment above about the new version of dbg_is_good_oop: this 
> was added because although there are not many callers of it, I did not want 
> to upset them.  During my earlier testing the detailed version of this method 
> did upset something, although I cannot reproduce that today.  
> 
> I will rerun some tests on these items...

> @kevinjwalls Thanks for clarifying the relevant behaviours and code paths as 
> well as identifying a few places where documentation might help devs avoid 
> tripping over any issues. Much appreciated.

@kevinjwalls I am fine with this as well. Thank you for answering our questions.

-

PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-2042743210


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc

2024-03-29 Thread Thomas Stuefe
On Fri, 29 Mar 2024 07:56:10 GMT, Thomas Stuefe  wrote:

>> src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 50:
>> 
>>> 48:   #undef malloc
>>> 49:   extern void *malloc(size_t) asm("vec_malloc");
>>> 50: #endif
>> 
>> This `#if` is not needed if we are building on AIX 7.2 TL5 SP7 or higher. 
>> This is the case in our build environment, but I think IBM is still building 
>> with SP5, which would run into build errors if I remove this `#if` now.
>
> While looking at this, I noticed that my question in 
> https://github.com/openjdk/jdk/pull/14146#discussion_r1207078176 and 
> followups had never been answered. Do you know the answers now?
> 
> Quoting myself:
> 
>> So, we do this only for malloc? Not for calloc, posix_memalign, realloc etc? 
>> What about free?
>> Removing that define and hard-coding it here assumes ... pointers it returns 
>> work with the unchanged free() and realloc() the system provides, and will 
>> always do so.
>> I am basically worried that undefining malloc, even if it seems harmless 
>> now, exposes us to difficult-to-investigate problems down the road, since it 
>> depends on how the libc devs will reform those macros in the future.

Other than that, and kind of depending on your answer: How important is it that 
we catch every use of the original malloc? Can be safely mix the original 
malloc with vec_malloc if logging is not involved?

I am asking, because from that it depends whether this hunk needs to appear 
right behind `#include ` or whether we can move it into the middle of 
the file together with the other AIX stuff. 

Because, if we move it into the middle of the file, we may miss any uses of 
malloc that may happen in system headers (would be unusual for that to happen 
but with IBM one never knows).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544201412


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc

2024-03-29 Thread Thomas Stuefe
On Thu, 28 Mar 2024 16:57:00 GMT, Joachim Kern  wrote:

>> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building 
>> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect 
>> clang by another name, and it uses the clang toolchain in the JDK build. 
>> Thus the old xlc toolchain was removed by 
>> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701).
>> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the 
>> last xlc rudiment.
>> This means merging the AIX specific content of 
>> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp 
>> into the corresponding gcc files on the on side and removing the 
>> defined(TARGET_COMPILER_xlc) blocks in the code, because the 
>> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX 
>> compiler.
>> The rest of the changes are needed because of using 
>> utilities/compilerWarnings_xlc.hpp the compiler is much more nagging about 
>> ill formatted printf
>
> src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 50:
> 
>> 48:   #undef malloc
>> 49:   extern void *malloc(size_t) asm("vec_malloc");
>> 50: #endif
> 
> This `#if` is not needed if we are building on AIX 7.2 TL5 SP7 or higher. 
> This is the case in our build environment, but I think IBM is still building 
> with SP5, which would run into build errors if I remove this `#if` now.

While looking at this, I noticed that my question in 
https://github.com/openjdk/jdk/pull/14146#discussion_r1207078176 and followups 
had never been answered. Do you know the answers now?

Quoting myself:

> So, we do this only for malloc? Not for calloc, posix_memalign, realloc etc? 
> What about free?
> Removing that define and hard-coding it here assumes ... pointers it returns 
> work with the unchanged free() and realloc() the system provides, and will 
> always do so.
> I am basically worried that undefining malloc, even if it seems harmless now, 
> exposes us to difficult-to-investigate problems down the road, since it 
> depends on how the libc devs will reform those macros in the future.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544199335


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc

2024-03-29 Thread Thomas Stuefe
On Fri, 29 Mar 2024 07:18:47 GMT, Thomas Stuefe  wrote:

>> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building 
>> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect 
>> clang by another name, and it uses the clang toolchain in the JDK build. 
>> Thus the old xlc toolchain was removed by 
>> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701).
>> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the 
>> last xlc rudiment.
>> This means merging the AIX specific content of 
>> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp 
>> into the corresponding gcc files on the on side and removing the 
>> defined(TARGET_COMPILER_xlc) blocks in the code, because the 
>> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX 
>> compiler.
>> The rest of the changes are needed because of using 
>> utilities/compilerWarnings_xlc.hpp the compiler is much more nagging about 
>> ill formatted printf
>
> src/hotspot/os/aix/loadlib_aix.cpp line 120:
> 
>> 118:   (lm->is_in_vm ? '*' : ' '),
>> 119:   (uintptr_t)lm->text, (uintptr_t)lm->text + lm->text_len,
>> 120:   (uintptr_t)lm->data, (uintptr_t)lm->data + lm->data_len,
> 
> Please don't cast, use `p2i()`.

Check copyrights in this file and all others. Adapt SAP and Oracle copyrights.

> src/hotspot/os/aix/os_aix.cpp line 651:
> 
>> 649: lt.print("Thread is alive (tid: " UINTX_FORMAT ", kernel thread id: 
>> " UINTX_FORMAT
>> 650:  ", stack [" PTR_FORMAT " - " PTR_FORMAT " (" SIZE_FORMAT 
>> "k using %luk pages)).",
>> 651:  os::current_thread_id(), (uintx) kernel_thread_id, 
>> (uintptr_t)low_address, (uintptr_t)high_address,
> 
> Use p2i, not cast

Here, and in other places too where you cast a pointer to fit into PTR_FORMAT 
or INTPTR_FORMAT

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544172412
PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544181011


Re: RFR: JDK-8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc

2024-03-29 Thread Thomas Stuefe
On Thu, 28 Mar 2024 16:50:20 GMT, Joachim Kern  wrote:

> As of [JDK-8325880](https://bugs.openjdk.org/browse/JDK-8325880), building 
> the JDK requires version 17 of IBM Open XL C/C++ (xlc). This is in effect 
> clang by another name, and it uses the clang toolchain in the JDK build. Thus 
> the old xlc toolchain was removed by 
> [JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701).
> Now we also switch the HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc, removing the 
> last xlc rudiment.
> This means merging the AIX specific content of 
> utilities/globalDefinitions_xlc.hpp and utilities/compilerWarnings_xlc.hpp 
> into the corresponding gcc files on the on side and removing the 
> defined(TARGET_COMPILER_xlc) blocks in the code, because the 
> defined(TARGET_COMPILER_gcc) blocks work out of the box for the new AIX 
> compiler.
> The rest of the changes are needed because of using 
> utilities/compilerWarnings_xlc.hpp the compiler is much more nagging about 
> ill formatted printf

Hi @JoKern65 ,

this is a nice simplification!

There are many casts that should be unnecessary (casting size_t to fit into 
SIZE_FORMAT) or that should use `p2i` (all those casts to fit pointers into 
PTR_FORMAT or INTPTR_FORMAT). I did not mark all of them.

We also have a new macro for printing memory ranges, RANGEFMT and RANGEFMTARGS. 
Maybe some call sites could that instead and make the code shorter and better 
to read? But I leave that up to you.

Cheers, and happy Easter!

src/hotspot/os/aix/loadlib_aix.cpp line 120:

> 118:   (lm->is_in_vm ? '*' : ' '),
> 119:   (uintptr_t)lm->text, (uintptr_t)lm->text + lm->text_len,
> 120:   (uintptr_t)lm->data, (uintptr_t)lm->data + lm->data_len,

Please don't cast, use `p2i()`.

src/hotspot/os/aix/os_aix.cpp line 314:

> 312:   ErrnoPreserver ep;
> 313:   log_trace(os, map)("disclaim failed: " RANGEFMT " errno=(%s)",
> 314:  RANGEFMTARGS(p, (long)maxDisclaimSize),

Wait, why are these casts needed? maxDisclaimSize is size_t, RANGEFMT uses 
SIZE_FORMAT. That should work without cast.

src/hotspot/os/aix/os_aix.cpp line 651:

> 649: lt.print("Thread is alive (tid: " UINTX_FORMAT ", kernel thread id: 
> " UINTX_FORMAT
> 650:  ", stack [" PTR_FORMAT " - " PTR_FORMAT " (" SIZE_FORMAT "k 
> using %luk pages)).",
> 651:  os::current_thread_id(), (uintx) kernel_thread_id, 
> (uintptr_t)low_address, (uintptr_t)high_address,

Use p2i, not cast

src/hotspot/os/aix/os_aix.cpp line 1212:

> 1210: st->print_cr("physical free  : " SIZE_FORMAT, (unsigned 
> long)mi.real_free);
> 1211: st->print_cr("swap total : " SIZE_FORMAT, (unsigned 
> long)mi.pgsp_total);
> 1212: st->print_cr("swap free  : " SIZE_FORMAT, (unsigned 
> long)mi.pgsp_free);

A better way to do this would be to change AIX::meminfo to use size_t. We 
should have done this when introducing that API.

src/hotspot/os/aix/os_aix.cpp line 1399:

> 1397: os->print("[" PTR_FORMAT " - " PTR_FORMAT "] (" UINTX_FORMAT
> 1398:   " bytes, %ld %s pages), %s",
> 1399:   (uintptr_t)addr, (uintptr_t)addr + size - 1, size, size / 
> pagesize, describe_pagesize(pagesize),

p2i

src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 62:

> 60: #include 
> 61: 
> 62: #if defined(LINUX) || defined(_ALLBSD_SOURCE) || defined(_AIX)

What else is left? Could we just remove this line altogether now?

src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 83:

> 81:   #error "xlc version not supported, macro __open_xl_version__ not found"
> 82: #endif
> 83: #endif // AIX

Can probably be shortened like this:

Suggestion:

#ifdef _AIX
#if !defined(__open_xl_version__) || (__open_xl_version__ < 17)
  #error "this xlc version is not supported"
#endif
#endif // AIX

src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 103:

> 101: #endif
> 102: 
> 103: #if !defined(LINUX) && !defined(_ALLBSD_SOURCE) && !defined(_AIX)

I believe this whole section can be removed now.

At least I have no idea who this is for. What gcc versions does OpenJDK still 
support, then, beside these platforms. Also, any gcc platform not on linux or 
bsd would have hit the #error below at line 132.

src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 128:

> 126: #if defined(__APPLE__)
> 127: inline int g_isnan(double f) { return isnan(f); }
> 128: #elif defined(LINUX) || defined(_ALLBSD_SOURCE) || defined(_AIX)

I think this can be just #else

src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 131:

> 129: inline int g_isnan(float  f) { return isnan(f); }
> 130: inline int g_isnan(double f) { return isnan(f); }
> 131: #else

and this can be removed

-

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18536#pullrequestreview-1967997963
PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544171741
PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1544174303
PR Review 

Re: jmx-dev RFR: 8327505: Test com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.java fails

2024-03-28 Thread Thomas Stuefe
On Tue, 19 Mar 2024 16:23:27 GMT, Kevin Walls  wrote:

> Client.java has a fixed 30-second timeout on the CountDownLatch to wait for 
> 10 notifications.
> 
> If it fails, you can't tell if CountDownLatch.await threw, or returned false 
> and the app threw InterruptedException, due to the way Client.java handles 
> these.
> 
> Seems most likely the 30 second wait expired, as we are dealing with -Xcomp 
> failures in a debug build.  Passing runs can take a few seconds, but can be 
> 25 seconds.
> 
> Increasing the timeout and tidying up the handling so we can see the specific 
> reason in future.

Looks good!

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18381#pullrequestreview-1965321788


Re: RFR: 8327505: Test com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.java fails

2024-03-28 Thread Thomas Stuefe
On Tue, 19 Mar 2024 16:23:27 GMT, Kevin Walls  wrote:

> Client.java has a fixed 30-second timeout on the CountDownLatch to wait for 
> 10 notifications.
> 
> If it fails, you can't tell if CountDownLatch.await threw, or returned false 
> and the app threw InterruptedException, due to the way Client.java handles 
> these.
> 
> Seems most likely the 30 second wait expired, as we are dealing with -Xcomp 
> failures in a debug build.  Passing runs can take a few seconds, but can be 
> 25 seconds.
> 
> Increasing the timeout and tidying up the handling so we can see the specific 
> reason in future.

Looks good!

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18381#pullrequestreview-1965321788


Re: RFR: JDK-8327769: jcmd GC.heap_dump without options should write to location given by -XX:HeapDumpPath, if set [v10]

2024-03-28 Thread Thomas Stuefe
On Wed, 27 Mar 2024 13:44:42 GMT, Matthias Baesken  wrote:

>> Currently jcmd command GC.heap_dump only works with an additionally provided 
>> file name.
>> Syntax : GC.heap_dump [options] 
>> 
>> In case the JVM has the XX - flag HeapDumpPath set, we should support an 
>> additional mode where the  is optional.
>> In case the filename is NOT set, we take the HeapDumpPath (file or 
>> directory);
>> 
>> new syntax :
>> GC.heap_dump [options]  .. has precedence over second option
>> GC.heap_dump [options] …in case -XX: HeapDumpPath=p is set
>> 
>> This would be a simplification e.g. for support cases where a filename or 
>> directory is set at JVM startup with -XX: HeapDumpPath=p and writing to the 
>> path is intended/recommended for usage also in the jcmd case.
>
> Matthias Baesken has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   adjust java.1 man page

To weight in on the side of this patch, in most customer scenarios I have seen, 
there is just one location for heap dumps. Using the same location for OOM and 
for manual heap dumps seems logical to me.

@dholmes-ora 
> Notwithstanding that there may be cases where this change would be 
> convenient, I really don't like this coupling between the jcmd behaviour and 
> a -XX flag that is intended for something else. It doesn't completely mesh 
> with the jcmd usage and other options, and the documentation story is quite 
> complicated.

Wouldn't this just be a case of changing a flag description? As luck has it, 
the flag already has a generic name that is not tied to OOMs.

-

PR Comment: https://git.openjdk.org/jdk/pull/18190#issuecomment-2024535044


Re: RFR: 8318026: jcmd should provide access to detailed JVM object information [v9]

2024-03-27 Thread Thomas Stuefe
On Tue, 26 Mar 2024 21:55:38 GMT, Kevin Walls  wrote:

>> Introduce the jcmd "VM.inspect" to implement access to detailed JVM object 
>> information.
>> 
>> Not recommended for live production use.  Requires UnlockDiagnosticVMOptions 
>> and not included in jcmd help output, to remind us this is not a 
>> general-purpose customer-facing tool.
>
> Kevin Walls has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Undo include

I am still feeling uneasy about this new command. I can see its potential 
usefulness, but as stated before would prefer it being limited to debug-only 
JVMs.

- jcmd can be exposed remotely, e.g., via jmx. I am not sure whether other 
solutions exist, but exposing jcmd beyond the immediate box the JVM runs on is 
something many folks want, and that is technically easy to do. So, 
customer-local solutions may exist for that, and the reach of jcmd may be 
larger than we think.
- the underlying functionality for print_location was written with debugging 
and error analysis in mind. We keep adding functionality there. I don't think 
developers adding to that function have in mind that this functionality may be 
exposed, possibly remotely.

In my opinion, UnlockDiagnosticVMOptions is not a good enough safeguard since 
it guards a whole swathe of switches that we may instruct the customer to 
enable. Once enabled, my experience is that UnlockDiagnosticVMOptions often 
lingers around. It is not unusual for customer scenarios to have set 
+UnlockDiagnosticVMOptions because of some years ago support cases.

If others feel this command is safe enough, I'll shut up and be quiet, since I 
cannot think up a concrete attack scenario.

-

PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-2022310874


Re: RFR: 8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync [v3]

2024-03-19 Thread Thomas Stuefe
On Mon, 18 Mar 2024 13:14:41 GMT, Richard Reingruber  wrote:

>> This pr changes  `JfrJvmtiAgent::retransform_classes()` and 
>> `jfr_set_enabled` to switch to `WXWrite` before transitioning to the vm.
>> 
>> Testing:
>> make test TEST=jdk/jfr/event/runtime/TestClassLoadEvent.java 
>> TEST_VM_OPTS=-XX:+AssertWXAtThreadSync
>> make test TEST=compiler/intrinsics/klass/CastNullCheckDroppingsTest.java 
>> TEST_VM_OPTS=-XX:+AssertWXAtThreadSync
>> 
>> More tests are pending.
>
> Richard Reingruber has updated the pull request with a new target base due to 
> a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains three additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into 8327990_jfr_enters_vm_without_wxwrite
>  - Set WXWrite at more locations
>  - Switch to WXWrite before entering the vm

I think this patch makes sense, and does not compete with a long-term solution.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18238#pullrequestreview-1947069985


Re: RFR: 8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync [v3]

2024-03-19 Thread Thomas Stuefe
On Tue, 19 Mar 2024 12:17:22 GMT, David Holmes  wrote:

>> Instead, could we tag code that needs one or the other, keep track of the 
>> current WX state in thread-local memory, and flip WX only when we know we 
>> need to? 

The first part we already do.

I wonder wheter we could - at least as workaround for if we missed a spot - do 
wx switching as a reaction to a SIBBUS related to WX violation in code cache. 
Switch state around, return from signal handler and retry operation.

(Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered 
in the kernel thread WX state had already been processed somehow).

> > That's very odd. The example there doesn't even involve MAP_JIT memory, so 
> > what does it have to do with WX?
> 
> @theRealAph that is the mystery we hope will be resolved once we know the 
> nature of the underlying OS bug. Somehow switching to exec mode 
> fixes/works-around the issue. I can imagine a missing conditional to check if 
> the region is MAP_JIT.
> 
> > Changing WX at VM state transitions is a form of temporal coupling, a 
> > classic design smell that has caused problems for decades.
> 
> The original introducers of WXEnable made the decision that the VM should be 
> in WRITE mode unless it needs EXEC. That is the state we are presently trying 
> to achieve with this change. If that original design choice is wrong then ...
> 
> > Instead, could we tag code that needs one or the other, keep track of the 
> > current WX state in thread-local memory, and flip WX only when we know we 
> > need to?
> 
> And I've asked about this every time a missing WXEnable has had to be added. 
> We seem to be generically able to describe what kind of code needs which 
> mode, but we seem to struggle to pin it down. Though that is what 
> https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing.
> 
> > That'd (by definition) reduce the number of transitions to the minimum if 
> > we were through.
> 
> Not necessarily. It may well remove some transitions from paths that don't 
> need it, but if you move the state change too low down the call chain you 
> could end up transitioning much more often in code that does need it e.g. if 
> a transitioning method is called in a loop. 

Not if you do the switching lazily. The first iteration would switch to the 
needed state; subsequent iterations would not do anything since the state 
already matches. Unless you interleave writes and execs, but then you would 
need the state changes anyway.

-

PR Comment: https://git.openjdk.org/jdk/pull/18238#issuecomment-2007469410


Re: RFR: 8327675: jspawnhelper should be built on all unix platforms

2024-03-08 Thread Thomas Stuefe
On Fri, 8 Mar 2024 10:17:08 GMT, Magnus Ihse Bursie  wrote:

> We should match the building of jspawnhelper in 
> make/modules/java.base/Launcher.gmk with the usage for all unix platforms in 
> src/java.base/unix/classes/java/lang/ProcessImpl.java.
> 
> Granted, the list of OSes in the makefile amounts to the current list of all 
> Unix OSes in the JDK, but there is no need to have this logical disparity. 
> 
> This was inspired by the discovery in 
> https://github.com/openjdk/jdk/pull/18112#discussion_r1517455696.

+1

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18165#pullrequestreview-1924593320


Re: RFR: 8327675: jspawnhelper should be built on all unix platforms

2024-03-08 Thread Thomas Stuefe
On Fri, 8 Mar 2024 10:17:08 GMT, Magnus Ihse Bursie  wrote:

> We should match the building of jspawnhelper in 
> make/modules/java.base/Launcher.gmk with the usage for all unix platforms in 
> src/java.base/unix/classes/java/lang/ProcessImpl.java.
> 
> Granted, the list of OSes in the makefile amounts to the current list of all 
> Unix OSes in the JDK, but there is no need to have this logical disparity. 
> 
> This was inspired by the discovery in 
> https://github.com/openjdk/jdk/pull/18112#discussion_r1517455696.

+1

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18165#pullrequestreview-1924593320


Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v7]

2024-03-06 Thread Thomas Stuefe
On Tue, 5 Mar 2024 11:31:13 GMT, Kevin Walls  wrote:

>> Introduce the jcmd "VM.debug" to implement access to a useful set of the 
>> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...".
>> 
>> Not recommended for live production use.  Calling these "debug" utilities, 
>> and not including them in the jcmd help output, is to remind us they are not 
>> general customer-facing tools.
>
> Kevin Walls has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Test update

Hi Kevin,

> 
> (If you find existing jcmds a mess, feel free to suggest changes as 
> appropriate.)

you touch on a problem here, and why I think adding commands should be done 
more carefully (and I am guilty of adding commands too). 

Once these commands are rolled out, they find themselves in blogs, knowledge 
bases, scripts that are reused for different JDK releases, and so on. It is 
difficult to change them post-release. That is why good names are important.

Cheers, Thomas

-

PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1980807020


Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v4]

2024-03-05 Thread Thomas Stuefe
On Mon, 4 Mar 2024 22:02:53 GMT, Kevin Walls  wrote:

>> Kevin Walls has updated the pull request incrementally with three additional 
>> commits since the last revision:
>> 
>>  - Usage correction
>>  - Help to clarify this is VM inspection.  Comment to relate source to 
>> debug.cpp.
>>  - jcheck trailing whitespace
>
> Hi Thomas @tstuefe -
> 
> Security concerns certainly needed some thought. We have remote access to 
> DiagnosticCommands over JMX.  I don't see a particular need for that for this 
> command.  I think it was keeping the DCmd flagged as hidden that hides it 
> from the list of dcmds available in that way.  So we have the attach api 
> controls using local user identity, as we do for everything else.
> 
> Yes, "events" is not that useful, I should remove it.  I was taking the 
> useful parts of debug.cpp, and although the events were in VM.info, I had 
> missed that since your JDK-8224600, VM.events is its own command now!
> 
> Thread.print with stacks, vs "VM.debug threads" which is just a thread list, 
> so you have e.g. JavaThread*, name, native id, stack range.
> 
> "find": Usage at the moment, would likely be MessageBoxOnError, and a native 
> debugger to get the native stacktrace and parameters that include an object 
> reference you care about.  It might be other jcmds if e.g. events capture a 
> relevant problematic pointer, but likely it involves a native debugger.
> Using jcmd behaves better, showing the output where you run jcmd, not in the 
> VM's current output as the debug.cpp utils do.
> Native debugger syntax to call abritrary functions can be awkward, 
> particularly on Windows, so the jcmd should be a better experience.
> 
> This might be more compelling in post-mortem usage.  Am working on that.  
> i.e. jcmd on a core file.  But I am saying it offers some value today.
> 
> The class/method finders, I've heard some enthusiasm for their inclusion.  We 
> don't want to encourage overlap but yes we do have some overlapping jcmds.
> 
> 
> This is an umbrella but I don't think it's vague. VMDebugDcmd is for 
> inspecting VM state.  It's inspired by debug.cpp utilities, does not need to 
> implement all of them, but does aim to make them more accessible (I will 
> assert that they are not well known, which is hard to prove.)
> 
> Do we have a problem with jcmd feature creep?  If anything has crept too far 
> it can be addressed. Looks like the DCmd classes have approximately doubled 
> since jdk8u but this looks like growth in a good way.
> 
> Thanks
> Kevin

Hi Kevin, @kevinjwalls ,

thank you for your explanations. Please find answers inline.

> Security concerns certainly needed some thought. We have remote access to 
> DiagnosticCommands over JMX. I don't see a particular need for that for this 
> command. I think it was keeping the DCmd flagged as hidden that hides it from 
> the list of dcmds available in that way. So we have the attach api controls 
> using local user identity, as we do for everything else.
> 
> Yes, "events" is not that useful, I should remove it. I was taking the useful 
> parts of debug.cpp, and although the events were in VM.info, I had missed 
> that since your JDK-8224600, VM.events is its own command now!
> 
> Thread.print with stacks, vs "VM.debug threads" which is just a thread list, 
> so you have e.g. JavaThread*, name, native id, stack range.
> 
> "find": Usage at the moment, would likely be MessageBoxOnError, and a native 
> debugger to get the native stacktrace and parameters that include an object 
> reference you care about. It might be other jcmds if e.g. events capture a 
> relevant problematic pointer, but likely it involves a native debugger. Using 
> jcmd behaves better, showing the output where you run jcmd, not in the VM's 
> current output as the debug.cpp utils do. Native debugger syntax to call 
> abritrary functions can be awkward, particularly on Windows, so the jcmd 
> should be a better experience.

I remain sceptic here, because in my experience, once you start poking at the 
JVM innards at this level, I guess you will be quickly at the limit of what 
this command can do for you and need to attach a debugger anyway.

Could this be an own command, e,g, `VM.inspect`, and possibly limited to debug 
VMs? Do we really need this feature in production?

> 
> This might be more compelling in post-mortem usage. Am working on that. i.e. 
> jcmd on a core file. But I am saying it offers some value today.

Don't we have jhsdb for that?

> 
> The class/method finders, I've heard some enthusiasm for their inclusion. We 
> don't want to encourage overlap but yes we do have some overlapping jcmds.

Yeah, I can see this being useful.

> 
> This is an umbrella but I don't think it's vague. VMDebugDcmd is for 
> inspecting VM state. It's inspired by debug.cpp utilities, does not need to 
> implement all of them, but does aim to make them more accessible (I will 
> assert that they are not well known, which is hard to prove.)
> 
> Do we have 

Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v4]

2024-03-04 Thread Thomas Stuefe
On Mon, 4 Mar 2024 15:10:12 GMT, Kevin Walls  wrote:

>> Introduce the jcmd "VM.debug" to implement access to a useful set of the 
>> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...".
>> 
>> Not recommended for live production use.  Calling these "debug" utilities, 
>> and not including them in the jcmd help output, is to remind us they are not 
>> general customer-facing tools.
>
> Kevin Walls has updated the pull request incrementally with three additional 
> commits since the last revision:
> 
>  - Usage correction
>  - Help to clarify this is VM inspection.  Comment to relate source to 
> debug.cpp.
>  - jcheck trailing whitespace

Of the sub commands given in the CSR:


eventsHotSpot JVM Event log
threads   Thread list
find ADDRESS  Describe an address given by the argument
findclass CLASS_PATTERN FLAGS   
findmethod CLASS_PATTERN METHOD_PATTERN FLAGS

1) `events` is redundant (we have VM.events, and events are printed as part of 
VM.info)
2) `threads` is handled by Thread.info
3) I am still puzzled about `find`. This is to ask a running VM about an 
arbitrary address. But how do you get such an address? Usually only via 
debugging. Then I am already attached with a system debugger. Under which 
circumstances would this be needed? Can you give a usage example?
4) and 5) shared some overlap with VM.classes, VM.class_hierarchy, VM.metaspace 
and various Compiler.xx commands. 

I am mostly worried about adding such a low-level debug command to jcmd. 
Especially in such a vague umbrella form. Experiences show that these commands 
get extended easily, often without CSR. So its an open door for feature creep. 

I mostly worry about unforeseen security consequences, especially if jcmd reach 
is extended across machine boundaries. Weighted against that, the benefits seem 
a bit slim to me. Again, real-world use cases would help.

-

PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1976941359


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v26]

2024-02-26 Thread Thomas Stuefe
On Mon, 26 Feb 2024 11:24:13 GMT, Suchismith Roy  wrote:

>> J2SE agent does not start and throws error when it tries to find the shared 
>> library ibm_16_am.
>> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load 
>> fails.It fails to identify the shared library ibm_16_am.a shared archive 
>> file on AIX.
>> Hence we are providing a function which will additionally search for .a file 
>> on AIX ,when the search for .so file fails.
>
> Suchismith Roy has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>Address review comments

Okay

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1902581556


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v25]

2024-02-24 Thread Thomas Stuefe
On Tue, 20 Feb 2024 09:27:15 GMT, Suchismith Roy  wrote:

>> J2SE agent does not start and throws error when it tries to find the shared 
>> library ibm_16_am.
>> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load 
>> fails.It fails to identify the shared library ibm_16_am.a shared archive 
>> file on AIX.
>> Hence we are providing a function which will additionally search for .a file 
>> on AIX ,when the search for .so file fails.
>
> Suchismith Roy has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>remove space

Changes requested by stuefe (Reviewer).

src/hotspot/os/aix/os_aix.cpp line 1167:

> 1165:  Load "libfilename.so" first, then load libfilename.a, on failure.
> 1166:  In OpenJ9, the libary with .so extension is loaded first and then .a 
> extension, on failure.
> 1167: */

Wrong block comment, but the comment itself is also off now. Suggestion:

Suggestion:

// Load library named 
// If filename matches .so, and loading fails, repeat with .a.

src/hotspot/os/aix/os_aix.cpp line 1173:

> 1171:   char* const pointer_to_dot = strrchr(file_path, '.');
> 1172:   char const *old_extension = ".so";
> 1173:   char const *new_extension = ".a";

Suggestion:

  char* const file_path = strdup(filename);
  char* const pointer_to_dot = strrchr(file_path, '.');
  const char old_extension[] = ".so";
  const char new_extension[] = ".a";
  STATIC_ASSERT(sizeof(old_exception) >= sizeof(new_exception));

and remove runtime-assert below

src/hotspot/os/aix/os_aix.cpp line 1178:

> 1176: FREE_C_HEAP_ARRAY(char, file_path);
> 1177: return result;
> 1178:   }

I would remove this whole section since it's a functional change not covered by 
the issue description and unnecessary for your fix. 

It may also be wrong: loading shared objects without extension may be perfectly 
valid. The extension is just a convention.

src/hotspot/os/aix/os_aix.cpp line 1183:

> 1181:   // If the load fails,we try to reload by changing the extension to .a 
> for .so files only.
> 1182:   // Shared object in .so format dont have braces, hence they get 
> removed for archives with members.
> 1183:   if (result == nullptr) {

Suggestion:

  if (result == nullptr && pointer_to_dot != nullptr && strcmp(pointer_to_dot, 
old_extension) == 0) {

src/hotspot/os/aix/os_aix.cpp line 1184:

> 1182:   // Shared object in .so format dont have braces, hence they get 
> removed for archives with members.
> 1183:   if (result == nullptr) {
> 1184: assert(strlen(new_extension) < strlen(old_extension), "New 
> extension length must be less than existing one");

Can be removed if you do the STATIC_ASSERT suggested above.

src/hotspot/os/aix/os_aix.cpp line 1186:

> 1184: assert(strlen(new_extension) < strlen(old_extension), "New 
> extension length must be less than existing one");
> 1185: if (strcmp(pointer_to_dot, old_extension) == 0) {
> 1186:   sprintf(pointer_to_dot, "%s", new_extension);

Use os::snprintf, and limit output buffer by size of old extension.

-

PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1899612274
PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757595
PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501754431
PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501754774
PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757736
PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757772
PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757844


Re: RFR: JDK-8252802: java launcher should set MALLOCOPTIONS and LDR_CNTRL in order to use 64KB pages on AIX

2024-02-19 Thread Thomas Stuefe
On Mon, 19 Feb 2024 05:52:22 GMT, Varada M  wrote:

> DeCapo Benchmark Results (3 runs) :  
> 
> Before : 
> = DaCapo 9.12 h2 PASSED in 281402 msec =
> = DaCapo 9.12 h2 PASSED in 269818 msec =
> = DaCapo 9.12 h2 PASSED in 279279 msec =
>  
> After:
> = DaCapo 9.12 h2 PASSED in 279192 msec =
> = DaCapo 9.12 h2 PASSED in 269769 msec =
> = DaCapo 9.12 h2 PASSED in 271577 msec = 
> 
> Environmental variables LDR_CNTRL and MALLOCOPTIONS has caused 
> test/jdk/java/lang/ProcessBuilder/Basic.java failure.
> Additional environmental variables has to removed from 
> removeAixExpectedVars().
> 
> JBS Issue : [JDK-8252802](https://bugs.openjdk.org/browse/JDK-8252802)

I don't think this works as intended. 

IIRC, both variables must be set *before process invocation*. Setting them 
inside the launcher will only affect child processes. The dacapo results posted 
seem to support this, they seem random-ish to me.

Note that we already use 64KB pages for all large memory regions (everything 
that goes through  `os::reserve_memory`). So, while the value of LDRCNTRL is 
not nil, it is very diminished. Mostly just the Heap is affected.

But I would not change defaults for options like these anyway, especially not 
hard-coded. These knobs have far-ranging implications. If we want that, we need 
to investigate carefully.

(for example: would using 64KB pages increase average heap size? Possibly, 
depending on the implementation. But AIX still has this problem where the heap 
can only live in the break and can bump against low-hanging regions. So, AIX is 
especially vulnerable against any changes that increase average heap usage)

-

PR Comment: https://git.openjdk.org/jdk/pull/17906#issuecomment-1953643518


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v23]

2024-02-19 Thread Thomas Stuefe
On Mon, 19 Feb 2024 13:25:28 GMT, Joachim Kern  wrote:

>> Thanks for the detailed explanation @JoKern65 . Do then in this errno check 
>> may not be necessary ? or can we still set errno and access it some way ?
>
> In this special case here I would not use errno, but the string returned in 
> ebuf, in case the result is nullptr.

Argh, I keep forgetting `dlopen` does not set errno (at least its not 
guaranteed by the standard).

Ok, in this case, I would not analyze the string either because it may be 
locale-dependent.

Never mind then; I'd thought this would be easy. One could probably use load() 
instead of dlopen() on AIX, but that change would require more investigations 
to see if load() and dlopen() are equivalent.

Okay, in that case just skip the errno test and call the second dlopen 
directly. Lets hope for the best then.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1494586044


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]

2024-02-16 Thread Thomas Stuefe
On Fri, 16 Feb 2024 12:25:39 GMT, Suchismith Roy  wrote:

> > > > Hi,
> > > > some remarks:
> > > > 
> > > > * there is no need for a copy for the first call to dll_load_library. 
> > > > Just hand in the string 1:1.
> > > > * I would only do the *.a fallback loading if the error indicates that 
> > > > the *.so file had not been there. So, only if EACCESS or ENOENT; in all 
> > > > other cases I would not do the fallback. E.g. if the *.so file cannot 
> > > > be loaded due to a header mismatch. See 
> > > > https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines
> > > > * Please use os::strdup.
> > > > * Please assert that the replacement string is smaller than the 
> > > > original string (which it should be, *.so is longer than *.a, but this 
> > > > is insurance against anyone changing the code in the future)
> > > > 
> > > > Thank you, Thomas
> > > 
> > > 
> > > Sure working on them. May i know why we are using the load routine in the 
> > > 2nd point ? . Currently we do a *.a fallback only when dlopen fails. Does 
> > > load function save some steps here ?
> > 
> > 
> > I don't understand the question, sorry.
> > What I mean is when the first dlopen fails AND its error indicates the 
> > shared library had been missing, only then attempt the *.a fallback.
> 
> I see. I think i was referred to the init routine in the link.

Ah, okay. No, IBMs dlopen page for AIX refers to this page for the list of 
possible error codes; I assume dlopen just uses these functions under the hood.

> So you mean based on the errors inside dll_load_library, we set the errno to 
> appropriate Error and then check for that before using the fallback , is that 
> correct ?

No. I mean distinguish whether dlopen fails because it could not find the file 
(a valid reason to retry with *.a) or failed for another reason (which would 
not be a valid reason to retry with *.a).

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-194840


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]

2024-02-15 Thread Thomas Stuefe
On Thu, 15 Feb 2024 17:50:23 GMT, Suchismith Roy  wrote:

> > Hi,
> > some remarks:
> > 
> > * there is no need for a copy for the first call to dll_load_library. Just 
> > hand in the string 1:1.
> > * I would only do the *.a fallback loading if the error indicates that the 
> > *.so file had not been there. So, only if EACCESS or ENOENT; in all other 
> > cases I would not do the fallback. E.g. if the *.so file cannot be loaded 
> > due to a header mismatch. See 
> > https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines
> > * Please use os::strdup.
> > * Please assert that the replacement string is smaller than the original 
> > string (which it should be, *.so is longer than *.a, but this is insurance 
> > against anyone changing the code in the future)
> > 
> > Thank you, Thomas
> 
> Sure working on them. May i know why we are using the load routine in the 2nd 
> point ? . Currently we do a *.a fallback only when dlopen fails. Does load 
> function save some steps here ?

I don't understand the question, sorry.

What I mean is when the first dlopen fails AND its error indicates the shared 
library had been missing, only then attempt the *.a fallback.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1947780121


Re: RFR: 8325878: Require minimum Clang version 13

2024-02-15 Thread Thomas Stuefe
On Fri, 16 Feb 2024 04:46:24 GMT, Kim Barrett  wrote:

> > Unfortunately this will break my workflow on Linux - I use clang to build 
> > on Ubuntu 20.04, which is not that old, but it ships with clang 12. This is 
> > not a deal breaker, just annoying.
> 
> That's unfortunate, but I think the [[noreturn]] issue is very important. 
> Until we have that fix, we can't rely on that attribute to silence certain 
> warnings. This requires us to continue to insert dead returns or apply other 
> workarounds. And forgetting to do so (because someone makes a change that 
> works fine with later versions or on other platforms) will lead to 
> build-breakage JBS issues/PRs just to continue to support older versions of 
> clang. We've already had several of those.

Thank you for explaining the motivation. @forax also mentioned that our 
workaround produces false warnings in IDEs (just checked in CDT). I'm fine with 
it - I will find another solution for my Linux box.

-

PR Comment: https://git.openjdk.org/jdk/pull/17862#issuecomment-1947776707


Re: RFR: 8314488: Compile the JDK as C++17 [v6]

2024-02-15 Thread Thomas Stuefe
On Mon, 5 Feb 2024 09:44:02 GMT, Magnus Ihse Bursie  wrote:

> > Sure, you can always install a newer GCC than the system one, but it's 
> > another thing that makes it harder for people to build OpenJDK. Having said 
> > that, C++17 is nice to have.
> 
> @theRealAph I am still just hearing hand-waving "perhaps someone somewhere 
> will have a somewhat harder time building the JDK". Yes, perhaps that is so. 
> But that is very uncertain, and I have still not heard a single concrete 
> example of where this would apply. In contrast, going to gcc 10 will clearly 
> bring a benefit to all other platforms, since it allows us to synchronize the 
> code base at C++17.
> 
> In light of this, I think we need to hear some really compelling evidence of 
> problems that would ensue if we raise the minimum to gcc 10. If nobody can 
> produce such evidence, then to me it is a sign that this fear is not 
> well-grounded, and we should proceed with this PR.

@magicus You put the onus of proving that problems could ensue strictly to the 
objectors. That is a bit one-sided. I do not see much effort - any, really - 
put into detailing the motivation for this move, neither in the PR description 
nor in the JBS issue text. I just read through the whole PR discussion and 
really the only helpful comment I found was from Kim ( 
https://github.com/openjdk/jdk/pull/14988#issuecomment-1858136247 ).

I think important decisions like enforcing C++17 would benefit from a more 
careful preparation.

-

PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1946628523


Re: RFR: 8314488: Compile the JDK as C++17 [v6]

2024-02-15 Thread Thomas Stuefe
On Thu, 15 Feb 2024 15:54:56 GMT, Magnus Ihse Bursie  wrote:

> > I would like it if toolchain version bumps were discussed somewhere else 
> > first, not in a PR. (And apologies if it was and I missed that discussion).
> 
> Yes, it definitely was. I posted a separate [mail to 
> build-dev](https://mail.openjdk.org/pipermail/build-dev/2024-January/042802.html)
>  with subject "Raising the minimum version of gcc, clang and xlc". I don't 
> think we can make it more clear than that.

Okay, thank you for that clarification. I clearly missed it then.

-

PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1946396577


Re: RFR: 8314488: Compile the JDK as C++17 [v6]

2024-02-15 Thread Thomas Stuefe
On Thu, 11 Jan 2024 13:23:45 GMT, Julian Waters  wrote:

>> Compile the JDK as C++17, enabling the use of all C++17 language features
>
> Julian Waters has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Require clang 13 in toolchain.m4

Just on a general note:

I would like it if toolchain version bumps were discussed somewhere else first, 
not in a PR. (And apologies if it was and I missed that discussion). Because 
PRs are usually followed only by active developers, but toolchain versions 
affect a broader part of the community. As we have seen in this PR, there are 
conflicting interests.

We have things like CSRs and JEPs. Both are not ideal - a JEP is way too 
massive, and a CSR is about the compatibility of the product, not about 
build-time. Still, maybe something like a CSR would make sense here.

I understand that with toolchain support, there will always be opposing 
parties, and I can see arguments on both sides. We don't want to end up like 
FreeBSD - stuck in time because of opinion stalemates. I just keep thinking 
that a PR is maybe not the perfect forum for this.

-

PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1946053820


Re: RFR: 8325878: Require minimum Clang version 13

2024-02-15 Thread Thomas Stuefe
On Thu, 15 Feb 2024 05:19:45 GMT, Kim Barrett  wrote:

> Please review this change that updates the minimum supported version of Clang
> to be used for building OpenJDK from 3.5 to 13.
> 
> This permits enabling C++17 (JDK-8314488), though Clang 5 might suffice for
> that. A minimum of Clang 13 also obtains a critical bug fix for the 
> [[noreturn]]
> attribute (see JDK-8303805).
> 
> Testing: mach5 tier1, which includes building with a recent version of 
> Xcode/clang.

Unfortunately this will break my workflow on Linux - I use clang to build on 
Ubuntu 20.04, which is not that old, but it ships with clang 12. This is not a 
deal breaker, just annoying.

-

PR Comment: https://git.openjdk.org/jdk/pull/17862#issuecomment-1946030283


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]

2024-02-15 Thread Thomas Stuefe
On Mon, 12 Feb 2024 18:04:21 GMT, Suchismith Roy  wrote:

>> J2SE agent does not start and throws error when it tries to find the shared 
>> library ibm_16_am.
>> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load 
>> fails.It fails to identify the shared library ibm_16_am.a shared archive 
>> file on AIX.
>> Hence we are providing a function which will additionally search for .a file 
>> on AIX ,when the search for .so file fails.
>
> Suchismith Roy has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove not matched trailing whitespaces

Hi,

some remarks:

- there is no need for a copy for the first call to dll_load_library. Just hand 
in the string 1:1.
- I would only do the *.a fallback loading if the error indicates that the *.so 
file had not been there. So, only if EACCESS or ENOENT; in all other cases I 
would not do the fallback. E.g. if the *.so file cannot be loaded due to a 
header mismatch. See 
https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines
- Please use os::strdup.
- Please assert that the replacement string is smaller than the original string 
(which it should be, *.so is longer than *.a, but this is insurance against 
anyone changing the code in the future)

Thank you, Thomas

-

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1882051726


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v13]

2024-02-01 Thread Thomas Stuefe
On Wed, 31 Jan 2024 13:17:21 GMT, Suchismith Roy  wrote:

>> J2SE agent does not start and throws error when it tries to find the shared 
>> library ibm_16_am.
>> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load 
>> fails.It fails to identify the shared library ibm_16_am.a shared archive 
>> file on AIX.
>> Hence we are providing a function which will additionally search for .a file 
>> on AIX ,when the search for .so file fails.
>
> Suchismith Roy has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   spelling

I'm busy with FOSDEM this week and probably next. Will look at this end of next 
week or the week after.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1921004805


Re: RFR: 8307977: jcmd and jstack broken for target processes running with elevated capabilities

2024-01-30 Thread Thomas Stuefe
On Tue, 30 Jan 2024 10:47:22 GMT, Sebastian Lövdahl  wrote:

> 8307977: jcmd and jstack broken for target processes running with elevated 
> capabilities

ping @jerboaa

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1916676356


Re: RFR: 8324659: GHA: Generic jtreg errors are not reported

2024-01-26 Thread Thomas Stuefe
On Thu, 25 Jan 2024 12:02:35 GMT, Aleksey Shipilev  wrote:

> When looking at [JDK-8324647](https://bugs.openjdk.org/browse/JDK-8324647), I 
> was surprised to see that GHA has this output:
> 
> 
> Running test 'jtreg:test/lib-test:tier1'
> /home/runner/work/jdk/jdk/test/lib-test/TEST.groups: group all: group 
> includes itself
> Error: One or more groups are invalid
> Finished running test 'jtreg:test/lib-test:tier1'
> Test report is stored in 
> build/run-test-prebuilt/test-results/jtreg_test_lib_test_tier1
> 
> ==
> Test summary
> ==
>TEST TOTAL PASS FAIL ERROR
>>> jtreg:test/lib-test:tier1 1 0 0 1 <<
> ==
> TEST FAILURE
> 
> 
> ...and yet, the tests are recorded as "green"! I think this is because our 
> error reporting code is tad buggy. AFAICS, `make test-prebuilt` does not 
> report a non-zero exit code on failure, and the external script looks for 
> `build/run-test-prebuilt/make-support/exit-with-error` to check for error. 
> But it does not _set_ the exit code on its own when the failure is 
> discovered, neither it sets `failure=true` on generic failure.
> 
> I think at very minimum we should report `failure=true` on generic failure. I 
> thought about reporting non-zero exit code from the parsing script, but it 
> does not feel clean to return non-zero exit code from the _parsing script_. 
> Non-zero exit code from that script should signify the error in the script 
> itself.
> 
> Additional testing:
>  - [x] Current GHA with lib-test failure (now red)
>  - [x] Current GHA with lib-test fix (now green)
>  - [x] GHA with artificial errors: 
> https://github.com/shipilev/jdk/actions/runs/7652797646/job/20854472816

Ouch. Wow. +1. 

Is this problem also in older releases?

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17568#pullrequestreview-1845283123


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v9]

2024-01-26 Thread Thomas Stuefe
On Thu, 25 Jan 2024 11:04:03 GMT, Suchismith Roy  wrote:

> > For me the unresolved question is still:
> > 
> > * do we want an unconditional load of *.a for a given *.so (have yet to see 
> > any documentation for this a-file duality)
> 
> Yes. The documentation link - 
> https://www.ibm.com/docs/en/aix/7.3?topic=memory-shared-objects-run-time-linking
>  The text **In dynamic mode, input files specified with the -l flag may end 
> in .so, as well as in .a. That is, a reference to -lfoo is satisfied by the 
> first libfoo.so or libfoo.a found in any of the directories being searched. 
> Dynamic mode is in effect by default unless the -bstatic option is used.**
> 
> https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command
> 
> Archive files are composite objects, which usually contain import files and 
> object files, including shared objects. If an archive file contains another 
> archive file or a member whose type is not recognized, the ld command issues 
> a warning and ignores the unrecognized member. If an object file contained in 
> an archive file has the F_LOADONLY bit set in the XCOFF header, the ld 
> command ignores the member. This bit is usually used to designate old 
> versions of shared objects that remain in the archive file to allow existing 
> applications to load and run. New applications link with the new version of 
> the shared object, that is, another member of the archive.

Excellent, thank you.

> 
> > * if we do, do we want that to be bidirectional? Someone specifies *.a, do 
> > we want to attempt to load *.so?
> 
> Considering the different scenarios, loading .a after .so failure should 
> suffice. I got a chance to look at the right file in OpenJ9-omr ,which has a 
> native code which does an attempt to load archive files after trying to load 
> .so files. This code was always there and it explains why the issue did not 
> occur in Semeru, which is derived from this repository.

Okay. We don't have to be better than J9 then. If they do it, we should too.

So, for the following input, we do:

"library.so" -> load "library.so", then "library.a"
"library"-> load "library.so", then "library.a" ?
"library.a"  -> only load "library.a" ?

(*)

> 
> > When in doubt, we should just mimic what OpenJ9 is doing on AIX. But I 
> > would like a clear documentation as a comment in os_aix.cpp explaining the 
> > logic and referencing the relevant OpenJ9 files.
> 
> Any example comment you can refer ? I mean i just mention the file name in 
> OpenJ9 and explain the logic ? Let me know for any further clarifications

Just reference the excerpts you mentioned above, then describe your intended 
logic. Example:

"When loading .so, upon failure we attempt to load . When loading a 
library given without extension, ..."

Explaining the logic makes it easy to see for the casual code reader what your 
intent is, that you have thought of all cases (*), and makes it possible to 
check the coding against your intent.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1911652995


Re: jmx-dev RFR: JDK-8324637: [aix] Implement support for reporting swap space in jdk.management

2024-01-25 Thread Thomas Stuefe
On Thu, 25 Jan 2024 12:30:15 GMT, Matthias Baesken  wrote:

> The get_total_or_available_swap_space_size coding misses AIX support, we only 
> return 0. This should be enhanced.
> The perfstat API can be used, see 
> https://www.ibm.com/docs/pt/aix/7.2?topic=interfaces-perfstat-memory-total-interface
>  .

Small nit, otherwise good.

src/jdk.management/unix/native/libmanagement_ext/OperatingSystemImpl.c line 113:

> 111: throw_internal_error(env, "perfstat_memory_total failed");
> 112: }
> 113: return available ? (jlong)(memory_info.pgsp_free * 4L * 1024L) : 
> (jlong)(memory_info.pgsp_total * 4L * 1024L);

Do we need the cast? perfstat_memory_total_t members are all 64-bit, no?

Also, can we shorten this to:


return (available ? memory_info.pgsp_free : memory_info.pgsp_total) * 4096;

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17569#pullrequestreview-1843868729
PR Review Comment: https://git.openjdk.org/jdk/pull/17569#discussion_r1466454083


Re: RFR: JDK-8324637: [aix] Implement support for reporting swap space in jdk.management

2024-01-25 Thread Thomas Stuefe
On Thu, 25 Jan 2024 12:30:15 GMT, Matthias Baesken  wrote:

> The get_total_or_available_swap_space_size coding misses AIX support, we only 
> return 0. This should be enhanced.
> The perfstat API can be used, see 
> https://www.ibm.com/docs/pt/aix/7.2?topic=interfaces-perfstat-memory-total-interface
>  .

Small nit, otherwise good.

src/jdk.management/unix/native/libmanagement_ext/OperatingSystemImpl.c line 113:

> 111: throw_internal_error(env, "perfstat_memory_total failed");
> 112: }
> 113: return available ? (jlong)(memory_info.pgsp_free * 4L * 1024L) : 
> (jlong)(memory_info.pgsp_total * 4L * 1024L);

Do we need the cast? perfstat_memory_total_t members are all 64-bit, no?

Also, can we shorten this to:


return (available ? memory_info.pgsp_free : memory_info.pgsp_total) * 4096;

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17569#pullrequestreview-1843868729
PR Review Comment: https://git.openjdk.org/jdk/pull/17569#discussion_r1466454083


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v9]

2024-01-23 Thread Thomas Stuefe
On Tue, 16 Jan 2024 08:36:49 GMT, Suchismith Roy  wrote:

>> J2SE agent does not start and throws error when it tries to find the shared 
>> library ibm_16_am.
>> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load 
>> fails.It fails to identify the shared library ibm_16_am.a shared archive 
>> file on AIX.
>> Hence we are providing a function which will additionally search for .a file 
>> on AIX ,when the search for .so file fails.
>
> Suchismith Roy has updated the pull request incrementally with three 
> additional commits since the last revision:
> 
>  - Update porting_aix.cpp
>  - Update porting_aix.cpp
>  - Update os_aix.cpp

For me the unresolved question is still:
- do we want an unconditional load of *.a for a given *.so (have yet to see any 
documentation for this a-file duality)
- if we do, do we want that to be bidirectional? Someone specifies *.a, do we 
want to attempt to load *.so?

When in doubt, we should just mimic what OpenJ9 is doing on AIX. But I would 
like a clear documentation as a comment in os_aix.cpp explaining the logic and 
referencing the relevant OpenJ9 files.

There are some minor issues with the code itself, but I will defer reviewing.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1907550400


Re: RFR: JDK-8323330: [BACKOUT] JDK-8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows

2024-01-10 Thread Thomas Stuefe
On Wed, 10 Jan 2024 09:18:53 GMT, Matthias Baesken  wrote:

> There have been concerns raised about 
> [JDK-8276809](https://bugs.openjdk.org/browse/JDK-8276809) , so bring back 
> the old behavior.

okay

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17341#pullrequestreview-1814102655


Re: RFR: JDK-8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows

2024-01-09 Thread Thomas Stuefe
On Tue, 2 Jan 2024 15:03:12 GMT, Matthias Baesken  wrote:

> The new test java/awt/font/JNICheck/FreeTypeScalerJNICheck.java introduced 
> with https://bugs.openjdk.java.net/browse/JDK-8269223 adds -Xcheck:jni , and 
> shows on Windows server 2019 the following JNI warning , so the test fails on 
> this Windows version.
> 
>  stdout: [WARNING in native method: JNI call made without checking exceptions 
> when required to from CallStaticVoidMethodV
>   at 
> sun.awt.Win32GraphicsEnvironment.initDisplay(java.desktop@23-internal/Native 
> Method)
>   at 
> sun.awt.Win32GraphicsEnvironment.initDisplayWrapper(java.desktop@23-internal/Win32GraphicsEnvironment.java:95)
>   at 
> sun.awt.Win32GraphicsEnvironment.(java.desktop@23-internal/Win32GraphicsEnvironment.java:63)
>   at 
> sun.awt.PlatformGraphicsInfo.createGE(java.desktop@23-internal/PlatformGraphicsInfo.java:34)
>   at 
> java.awt.GraphicsEnvironment$LocalGE.createGE(java.desktop@23-internal/GraphicsEnvironment.java:93)
>   at 
> java.awt.GraphicsEnvironment$LocalGE.(java.desktop@23-internal/GraphicsEnvironment.java:84)
>   at 
> java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(java.desktop@23-internal/GraphicsEnvironment.java:106)
>   at FreeTypeScalerJNICheck.runTest(FreeTypeScalerJNICheck.java:53)
>   at FreeTypeScalerJNICheck.main(FreeTypeScalerJNICheck.java:44)
> 
> 
> We better add an exception check to get rid of the JNI warning (and also of 
> the test failure) .

Hi Phil,

sorry for this, this was not bad intent, just a plain mistake. It is somewhat 
difficult to remember the exact review rules per project. Help from Skara would 
certainly be useful here.

Cheers, Thomas

-

PR Comment: https://git.openjdk.org/jdk/pull/17224#issuecomment-1883471813


Re: RFR: JDK-8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows

2024-01-08 Thread Thomas Stuefe
On Tue, 2 Jan 2024 15:03:12 GMT, Matthias Baesken  wrote:

> The new test java/awt/font/JNICheck/FreeTypeScalerJNICheck.java introduced 
> with https://bugs.openjdk.java.net/browse/JDK-8269223 adds -Xcheck:jni , and 
> shows on Windows server 2019 the following JNI warning , so the test fails on 
> this Windows version.
> 
>  stdout: [WARNING in native method: JNI call made without checking exceptions 
> when required to from CallStaticVoidMethodV
>   at 
> sun.awt.Win32GraphicsEnvironment.initDisplay(java.desktop@23-internal/Native 
> Method)
>   at 
> sun.awt.Win32GraphicsEnvironment.initDisplayWrapper(java.desktop@23-internal/Win32GraphicsEnvironment.java:95)
>   at 
> sun.awt.Win32GraphicsEnvironment.(java.desktop@23-internal/Win32GraphicsEnvironment.java:63)
>   at 
> sun.awt.PlatformGraphicsInfo.createGE(java.desktop@23-internal/PlatformGraphicsInfo.java:34)
>   at 
> java.awt.GraphicsEnvironment$LocalGE.createGE(java.desktop@23-internal/GraphicsEnvironment.java:93)
>   at 
> java.awt.GraphicsEnvironment$LocalGE.(java.desktop@23-internal/GraphicsEnvironment.java:84)
>   at 
> java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(java.desktop@23-internal/GraphicsEnvironment.java:106)
>   at FreeTypeScalerJNICheck.runTest(FreeTypeScalerJNICheck.java:53)
>   at FreeTypeScalerJNICheck.main(FreeTypeScalerJNICheck.java:44)
> 
> 
> We better add an exception check to get rid of the JNI warning (and also of 
> the test failure) .

+1. I agree with Ralf, logging is not that important. Possibly just add a 
(c-runtime) assert. Up to you.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17224#pullrequestreview-1808867637


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v10]

2023-12-22 Thread Thomas Stuefe
On Fri, 22 Dec 2023 14:50:16 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   additional fix of sideeffect reported in JDK-8322691

src/hotspot/os/aix/porting_aix.cpp line 1071:

> 1069: if (max_handletable == 0) {
> 1070:   // First time we allocate memory for 128 Entries
> 1071:   char* ptmp = (char*)::malloc(128 * sizeof(struct 
> handletableentry));

No need for malloc. You can start with realloc, since realloc(NULL, ...) is 
malloc.


static handletablentry* tab = nullptr;
static unsigned max_handles = 0;
...

if (need more handles)
   unsigned new_max = MAX2(max_handles * 2, init_num_handles);
   handleentry* new_tab = ::realloc(p_handletable, sizeof(handleentry) * 
new_max);
   if (new_tab != nullptr) {
 max_handles = new_max;
 tab= new_tab;
   }

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1435139923


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]

2023-12-21 Thread Thomas Stuefe
On Thu, 21 Dec 2023 11:54:17 GMT, Martin Doerr  wrote:

>> Dynamic allocation also opens us up to potential initialization issues, 
>> unless we explicitly use raw ::malloc. It should work, but I think its 
>> better avoided unless we really need it.
>
> Well we're fixing an academic issue by introducing another one? Doesn't make 
> sense to me.

Okay, I butt out, I don't care enough. Up to you both to decide what to do. 

My recommendation would still be to avoid hotspot infrastructure that relies on 
os::malloc and friends; other than that, rewriting this table to make it 
growable using realloc should be trivial. Note that we need *some* sort of 
limit though.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433996779


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]

2023-12-21 Thread Thomas Stuefe
On Thu, 21 Dec 2023 11:23:46 GMT, Joachim Kern  wrote:

>> I don't like introducing unnecessary limitations. Are we sure nobody will 
>> ever need more than 1024 handles?
>> Can't we at least use a GrowableArray or something like that?
>
> In principle you are right, but in my opinion 1024 is an academical limit. I 
> never saw processes with more than a few dozen loaded libraries.

Dynamic allocation also opens us up to potential initialization issues, unless 
we explicitly use raw ::malloc. It should work, but I think its better avoided 
unless we really need it.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433963889


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v7]

2023-12-21 Thread Thomas Stuefe
On Thu, 21 Dec 2023 09:37:55 GMT, Suchismith Roy  wrote:

> > > > What happens if we accidentally attempt to load a "real" static 
> > > > library, which is also named *.a? Would dlopen() then crash? What would 
> > > > happen?
> > 
> > 
> > > I don't think the problem is with *.a . They would load as the default 
> > > behaviour of the dlopen. It is only when the dlopen fails for *.so , we 
> > > give another chance to check for .a file with the same name.
> > 
> > 
> > No, what I meant, and what must be clarified before going forward with this 
> > solution, is the following:
> > 
> > * is _every_ `*.a` object on AIX loadable with `dlopen`, and will the 
> > result be the same as when loading a `*.so` object
> > * or, if we present arbitrary `*.a` files to dlopen, is there a chance for 
> > dlopen to crash or misbehave.
> > 
> > Reason is that I was under the impression that *.a libraries are static 
> > libraries and cannot be loaded dynamically. This is what you now try to do.
> > If we cannot safely answer this question, I would opt for a more narrow 
> > solution by hard-wiring known alternative names. So, do the second *.a 
> > attempt only for your `ibm_16_am.a` which you know works. That could also 
> > be done in a reasonably maintainable manner.
> 
> In AIX, both static and dynamic libraries have *.a extension. And AIX also 
> supports *.so files.Bascially shared objects in AIX have both *.a and *.so 
> extension. Hence we need to implement this logic. If we try loading a static 
> archive specifically ,how the dlopen would behave , that is something 
> probably @JoKern65 can answer ?

Rather, this is a question you have to ask your collegues at IBM that develop 
the AIX libc.

Since AIX libc is not open source, we cannot look for ourselves, nor can 
Joachim (her works at SAP).

> 
> > > > Does this really have to be handled in the OpenJDK? What does J9 on AIX 
> > > > do? Could this be done in a simpler way outside OpenJDK, e.g. by 
> > > > providing an *.so variant of the library in question? Where does this 
> > > > library come from?
> > 
> > 
> > > I am not sure how J9 handles this. I would have to consult .
> > 
> > 
> > J9 is Open Source, can't you just look? :)
> 
> I did try comparing the file structures, and i do not see a similar file 
> structure over there. I am unable to find the jvmTiAgent code and also os_aix 
> file. So i am not sure which functions over there are doing the same 
> functionality. You have any suggestion on how i can check and correlate ?

Someone must implement LoadLibrary. Try looking for places where dlopen() is 
called.

> 
> > > However as per current observation, this issue does not show up on 
> > > Semuru. This issue is only happening on Adoptium. The team that release 
> > > these file has always released *.a files which work fine for Semuru.
> > 
> > 
> > I don't know what Semuru is. What is the context, is that a different VM? 
> > Also OpenJDK? J9 derived?
> 
> Semuru is J9 derived.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1865977132


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]

2023-12-21 Thread Thomas Stuefe
On Thu, 21 Dec 2023 09:37:57 GMT, Joachim Kern  wrote:

>> src/hotspot/os/aix/porting_aix.cpp line 916:
>> 
>>> 914: constexpr int max_handletable = 1024;
>>> 915: static int g_handletable_used = 0;
>>> 916: static struct handletableentry g_handletable[max_handletable] = {{0, 
>>> 0, 0, 0}};
>> 
>> Wouldn't `ConcurrentHashTable` be a better data structure? It is already 
>> used in hotspot, can grow dynamically and doesn't need linear search.
>
> There will be only few libraries in the list. With this assumption Thomas 
> suggested to use just a simple array.

Let's keep it simple. A linear array of only a few items is easily scanned, 
probably faster than pointer hopping hash table entries. Not that it matters in 
any way for the few calls to dlopen.

Also, avoiding hotspot structures preserves layer integrity (porting_aix does 
not pull anything from hotspot so far) and prevents initialisation time 
dependencies. Not sure whether ConcurrentHashTable works before VM init, but 
with Joachimes current solution, we can call dlopen at any time in VM life.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433839119


Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v7]

2023-12-21 Thread Thomas Stuefe
On Thu, 21 Dec 2023 08:16:22 GMT, Suchismith Roy  wrote:

>> What happens if we accidentally attempt to load a "real" static library, 
>> which is also named *.a? Would dlopen() then crash? What would happen?

> I don't think the problem is with *.a . They would load as the default 
> behaviour of the dlopen. It is only when the dlopen fails for *.so , we give 
> another chance to check for .a file with the same name.

No, what I meant, and what must be clarified before going forward with this 
solution, is the following:

- is *every* `*.a` object on AIX loadable with `dlopen`, and will the result be 
the same as when loading a `*.so` object
- or, if we present arbitrary `*.a` files to dlopen, is there a chance for 
dlopen to crash or misbehave.

Reason is that I was under the impression that *.a libraries are static 
libraries and cannot be loaded dynamically. This is what you now try to do.

If we cannot safely answer this question, I would opt for a more narrow 
solution by hard-wiring known alternative names. So, do the second *.a attempt 
only for your `ibm_16_am.a` which you know works. That could also be done in a 
reasonably maintainable manner.

>> Does this really have to be handled in the OpenJDK? What does J9 on AIX do? 
>> Could this be done in a simpler way outside OpenJDK, e.g. by providing an 
>> *.so variant of the library in question? Where does this library come from?

> I am not sure how J9 handles this. I would have to consult .

J9 is Open Source, can't you just look? :)

> However as per current observation, this issue does not show up on Semuru. 
> This issue is only happening on Adoptium. The team that release these file 
> has always released *.a files which work fine for Semuru.

I don't know what Semuru is. What is the context, is that a different VM? Also 
OpenJDK? J9 derived?

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1865859345


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]

2023-12-20 Thread Thomas Stuefe
On Wed, 20 Dec 2023 14:53:06 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   improve error handling

still ok, small nit inside

src/hotspot/os/aix/porting_aix.cpp line 1033:

> 1031: // filled by os::dll_load(). This way we mimic dl handle equality for a 
> library
> 1032: // opened a second time, as it is implemented on other platforms.
> 1033: void* Aix_dlopen(const char* filename, int Flags, const char** 
> error_report) {

add assert for error_report != nullptr

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1792301031
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433606032


Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v7]

2023-12-20 Thread Thomas Stuefe
On Wed, 20 Dec 2023 11:16:03 GMT, Suchismith Roy  wrote:

>> J2SE agent does not start and throws error when it tries to find the shared 
>> library ibm_16_am.
>> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load 
>> fails.It fails to identify the shared library ibm_16_am.a shared archive 
>> file on AIX.
>> Hence we are providing a function which will additionally search for .a file 
>> on AIX ,when the search for .so file fails.
>
> Suchismith Roy has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Spaces fix

Hi,

some requests and questions:

- Please modify the JBS title, PR title, and JBS issue text to reflect that 
this adds an alternative shared object loading path for shared objects on AIX. 
Something like "Allow loading shared objects with .a extension on AIX". Please 
describe the new logic in the JBS issue text.
- Does this really have to be handled in the OpenJDK? What does J9 on AIX do? 
Could this be done in a simpler way outside OpenJDK, e.g. by providing an *.so 
variant of the library in question? Where does this library come from?
- What happens if we accidentally attempt to load a "real" static library, 
which is also named *.a? Would dlopen() then crash? What would happen?
- What happens if the original path handed to os::dll_load is already a *.a 
file? Should the logic then be reversed?
- We really need regression tests for this.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1864572287


Re: RFR: 8314488: Compile the JDK as C++17 [v2]

2023-12-20 Thread Thomas Stuefe
On Fri, 15 Dec 2023 16:20:02 GMT, Kim Barrett  wrote:

>> Julian Waters has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains five additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'openjdk:master' into patch-7
>>  - Revert vm_version_linux_riscv.cpp
>>  - vm_version_linux_riscv.cpp
>>  - allocation.cpp
>>  - 8310260
>
> I agree that before throwing this switch, we need to look at some specific
> issues that might need to be addressed, discuss the benefits, and also the
> costs.
> 
> As was discussed for the change to C++14, there is *never* a good time to
> start introducing the use of new language features as far as backporting is
> concerned, unless one is going to backport the language change too. We didn't
> do that for C++14, and I don't think we are going to (nor should) do it for
> C++17 either.  But backporting concerns can't be all powerful, as that will
> forever prevent potentially significant improvements.
> 
> I started to make a list of new language features that seem particularly
> beneficial or otherwise important.  I was going to write style guide updates
> for these, but haven't gotten very far with that yet.
> 
> P0035R4: Dynamic memory allocation for over-aligned data
> P0135R1: Guaranteed copy elision
> P0145R3: Refining Expression Evaluation Order for Idiomatic C++
> P0292R2: constexpr if
> P0091R3/P0512R0: Template argument deduction for class templates
> 
> Here are some others that might be of interest to us.
> N4268: Allow constant evaluation for all non-type template arguments
> N3928: Extending static_assert
> P0118R1: [[fallthrough]] attribute
> P0189R1: [[nodiscard]] attribute
> P0212R1: [[maybe_unused]] attribute
> P0170R1: Wording for constexpr lambda
> P0283R2: Ignoring unsupported non-standard attributes 
> P0061R1: __has_include for C++17
> P0386R2: Inline variables

@kimbarrett 
> P0035R4: Dynamic memory allocation for over-aligned data

Do we really need this? I ask because, in the end, this will result in 
something like `posix_memalign` to be called, and I remember it being notorious 
for causing large footprint overhead depending on how smart the underlying 
allocator is about using alignment waste.

It will also be non-trivial to implement in hotspot since NMT uses malloc 
headers. Barring a rewrite of NMT malloc metadata tracking (e.g. using a hash 
map, which would be more costly both in terms of performance and, probably, 
footprint), malloc headers would have to be revised. Probably would need to be 
dynamic-sized. This is the reason we did not bother wrapping posix_memalign.

-

PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1864039399


Re: RFR: 8322417: Console read line with zero out should zero out when throwing exception

2023-12-19 Thread Thomas Stuefe
On Tue, 19 Dec 2023 12:47:53 GMT, Goetz Lindenmaier  wrote:

> …g exception
> 
> After leaving the method by throwing an exception the data can not be cleaned 
> any more.

Seems reasonable.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17156#pullrequestreview-1789380436


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]

2023-12-19 Thread Thomas Stuefe
On Tue, 19 Dec 2023 12:37:33 GMT, Suchismith Roy  wrote:

>> The libpath parsing code is from me, so no license problems.
>
> Hi @JoKern65  Is this good to integrate now ?

@suchismith1993 

> Once this code goes in I can push in my changes. We are targeting the fix for 
> January.

If you talk about https://github.com/openjdk/jdk/pull/16604, no, you cannot 
push that even if Joachim finishes his work.

Your patch has not even a single review, is quite controversial, and none of 
the issues the reviewers have mentioned are addressed. This needs a lot more 
discussion time.

-

PR Comment: https://git.openjdk.org/jdk/pull/16920#issuecomment-1862704694


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v6]

2023-12-19 Thread Thomas Stuefe
On Mon, 18 Dec 2023 13:33:46 GMT, Thomas Stuefe  wrote:

>> Joachim Kern has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Followed Thomas proposals
>
> Well done.
> 
> Releasing the mutex before asserting is not necessary; we don't pull the 
> handle table lock as part of error reporting.

> @tstuefe Sorry to tag you. Can you review the code. Once this code goes in I 
> can push in my changes.
We are targeting the fix for January.

> Hi @JoKern65 Is this good to integrate now ?

@suchismith1993 Please don't put pressure on patch authors and developers. 
There is zero reason why this patch should be rushed. 

> Hi @suchismith1993, I'm waiting for a second review. Complex hotspot changes 
> should be reviewed twice.

Not only that, hotspot changes *need* to be reviewed by at least two reviewers. 
That is not optional. See OpenJDK bylaws.

-

PR Comment: https://git.openjdk.org/jdk/pull/16920#issuecomment-1862695052


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]

2023-12-18 Thread Thomas Stuefe
On Mon, 18 Dec 2023 11:12:23 GMT, Joachim Kern  wrote:

>> src/hotspot/os/aix/porting_aix.cpp line 1097:
>> 
>>> 1095:   }
>>> 1096: 
>>> 1097:   pthread_mutex_lock(_handletable_mutex);
>> 
>> You can make your life a lot easier by defining an RAII object at the start 
>> of the file:
>> 
>> struct TableLocker {
>>   TableLocker() { pthread_mutex_lock(_handletable_mutex); }
>>   ~TableLocker() { pthread_mutex_unlock(_handletable_mutex); }
>> };
>> 
>> and just place this at the beginning of your two functions
>> 
>> TableLocker lock:
>> ...
>> 
>> 
>> no need to manually unlock then, with the danger of missing a return.
>
> Great, thank you. This was one of the things I thought about, but was not 
> sure, because I did not fully understood the MutexLocker class and the 
> difference between Monitor and Mutex.

In hindsight, pthread mutex is the better choice anyway: it avoids dependencies 
to the VM lifecycle (VM mutexes are only available after VM initialization, so 
we could not call dlopen() before that).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1430380082


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]

2023-12-18 Thread Thomas Stuefe
On Mon, 18 Dec 2023 10:35:48 GMT, Joachim Kern  wrote:

>> src/hotspot/os/aix/porting_aix.cpp line 990:
>> 
>>> 988:   if (env == nullptr) {
>>> 989: // no LIBPATH, try with LD_LIBRARY_PATH
>>> 990: env = getenv("LD_LIBRARY_PATH");
>> 
>> Is LD_LIBRARY_PATH a thing on AIX? I thought it is only used on non-AIX.
>
> Yes it is, It's the fallback if LIBPATH is not defined

In that case there may be errors in other places, since so far we assumed its 
either one or the other, but not both. Example:

https://github.com/openjdk/jdk/blob/a247d0c74bea50f11d24fb5f3576947c6901e567/src/java.base/unix/native/libjli/java_md.c#L43C1-L47

Maybe you need to take a look here, in case LD_LIBRARYPATH needs to be handled 
in addition to LIBPATH?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429917901


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v6]

2023-12-18 Thread Thomas Stuefe
On Mon, 18 Dec 2023 11:30:59 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Followed Thomas proposals

Well done.

Releasing the mutex before asserting is not necessary; we don't pull the handle 
table lock as part of error reporting.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1786905733


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]

2023-12-18 Thread Thomas Stuefe
On Mon, 18 Dec 2023 10:06:34 GMT, Thomas Stuefe  wrote:

>> Joachim Kern has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - trailing whitespace
>>  - Following most of Thomas proposals
>
> src/hotspot/os/aix/porting_aix.cpp line 934:
> 
>> 932:   struct scnhdr the_scn;
>> 933:   struct ldhdr the_ldr;
>> 934:   size_t sz = FILHSZ + _AOUTHSZ_EXEC;
> 
> please rename to xcoffsz, and make constexpr: `constexpr size_t xcoffsz = ...`

Also, can you please add

STATIC_ASSERT(sizeof(the_xcoff) == xcoffsz);
STATIC_ASSERT(sizeof(the_scn) == SCNHSZ);
STATIC_ASSERT(sizeof(the_ldr) == LDHDRSZ);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429853292


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]

2023-12-18 Thread Thomas Stuefe
On Fri, 15 Dec 2023 11:57:51 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - trailing whitespace
>  - Following most of Thomas proposals

I like this, this is good.

Small nits remain.

src/hotspot/os/aix/os_aix.cpp line 30:

> 28: #pragma alloca
> 29: 
> 30: 

please remove whitespace change

src/hotspot/os/aix/os_aix.cpp line 193:

> 191: // local variables
> 192: 
> 193: 

please remove whitespace change

src/hotspot/os/aix/os_aix.cpp line 1113:

> : }
> 1112: 
> 1113: 

please remove whitespace change

src/hotspot/os/aix/porting_aix.cpp line 934:

> 932:   struct scnhdr the_scn;
> 933:   struct ldhdr the_ldr;
> 934:   size_t sz = FILHSZ + _AOUTHSZ_EXEC;

please rename to xcoffsz, and make constexpr: `constexpr size_t xcoffsz = ...`

src/hotspot/os/aix/porting_aix.cpp line 990:

> 988:   if (env == nullptr) {
> 989: // no LIBPATH, try with LD_LIBRARY_PATH
> 990: env = getenv("LD_LIBRARY_PATH");

Is LD_LIBRARY_PATH a thing on AIX? I thought it is only used on non-AIX.

src/hotspot/os/aix/porting_aix.cpp line 1005:

> 1003: // LIBPATH or LD_LIBRARY_PATH and second with burned in libpath.
> 1004: // No check against current working directory
> 1005: Libpath.print("%s:%s", env, rtv_linkedin_libpath());

Are you sure libpath env var has precedence over the baked-in libpath?

src/hotspot/os/aix/porting_aix.cpp line 1097:

> 1095:   }
> 1096: 
> 1097:   pthread_mutex_lock(_handletable_mutex);

You can make your life a lot easier by defining an RAII object at the start of 
the file:

struct TableLocker {
  TableLocker() { pthread_mutex_lock(_handletable_mutex); }
  ~TableLocker() { pthread_mutex_unlock(_handletable_mutex); }
};

and just place this at the beginning of your two functions

TableLocker lock:
...


no need to manually unlock then, with the danger of missing a return.

src/hotspot/os/aix/porting_aix.cpp line 1101:

> 1099:   for (i = 0; i < g_handletable_used; i++) {
> 1100: if (g_handletable[i].handle == libhandle) {
> 1101:   // handle found, decrease refcount

`assert(refcount > 0, "Sanity"))`

src/hotspot/os/aix/porting_aix.cpp line 1143:

> 1141:   // entry of the array to the place of the entry we want to remove 
> and overwrite it
> 1142:   if (i < g_handletable_used) {
> 1143: g_handletable[i] = g_handletable[g_handletable_used];

To be super careful, I would zero out at least the handle of the moved item 
like this:
`g_handletable[g_handletable_used].handle = nullptr`

-

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1786400492
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870755
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870833
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870885
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429849403
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429858465
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429859923
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429868182
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429863665
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870057


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]

2023-12-15 Thread Thomas Stuefe
On Fri, 15 Dec 2023 10:18:53 GMT, Joachim Kern  wrote:

>> src/hotspot/os/aix/os_aix.cpp line 206:
>> 
>>> 204: constexpr int max_handletable = 1024;
>>> 205: static int g_handletable_used = 0;
>>> 206: static struct handletableentry g_handletable[max_handletable] = {{0, 
>>> 0, 0, 0}};
>> 
>> I would move all that new and clearly delineated dlopen stuff into an own 
>> file, e.g. dlopen_aix.cpp or porting_aix.cpp (in porting_aix.cpp, we already 
>> have wrappers for other functions). os_aix.cpp is already massive.
>
> I moved the static variable declarations and the functions `Aix_dlopen(), 
> search_file_in_LIBPATH(), rtv_linkedin_libpath()` and  `os::pd_dll_unload()` 
> to porting_aix.cpp. This links, but in my opinion `os::pd_dll_unload()` 
> should reside in os_aix.cpp, because it is member of the os class. But there 
> it will not compile anymore if the static variables are moved away.

No, what I meant was to provide a "libc-like" equivalent for dlopen, similar to 
what we do with dladdr (see 
https://github.com/openjdk/jdk/blob/b7676822886eac21f61ff361a32928a966d8fe31/src/hotspot/os/aix/porting_aix.cpp#L306).

But never mind; I am also fine with moving os::pd_dlopen into a different cpp 
file, e.g. "dlopen_aix.cpp". Just move it out of os_aix.cpp, since that is 
already massive and you add >300 lines of more code and more dependencies.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427812795


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]

2023-12-15 Thread Thomas Stuefe
On Fri, 15 Dec 2023 09:57:19 GMT, Joachim Kern  wrote:

>  If we omit the xcoff32 we have to ensure that no xcoff32 executable file 
> comes into play.

xcoff32 is for 32-bit binaries. The AIX port only exists for 64-bit, and there 
will never be a 32-bit AIX port, so there is no reason for handling 32-bit 
xcoff headers.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427803763


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]

2023-12-14 Thread Thomas Stuefe
On Fri, 15 Dec 2023 06:22:39 GMT, Thomas Stuefe  wrote:

>> Joachim Kern has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   followed the proposals
>
> src/hotspot/os/aix/os_aix.cpp line 1129:
> 
>> 1127: 
>> 1128: // get the library search path burned in to the executable file during 
>> linking
>> 1129: // If the libpath cannot be retrieved return an empty path
> 
> This is new. Is this complexity needed, if yes, why? Don't see a comment, may 
> have missed it.

Also, why are we parsing xcoff32 headers in there? AIX OpenJDK will always be 
64-bit. So, you can replace the whole xcoff32 section with assert( f_magic == 
U802TOCMAGIC, ..). The function becomes a lot simpler then.

> src/hotspot/os/aix/os_aix.cpp line 1132:
> 
>> 1130: static const char* rtv_linkedin_libpath() {
>> 1131:   static char buffer[4096];
>> 1132:   static const char* libpath = 0;
> 
> If your intent is to return an empty buffer if there is no contained libpath, 
> I would just:
> 
> 
> static const char* libpath = "";
> 
> then you can always just return libpath.

But looking at the using code, returning NULL in case there is no contained 
libpath would be actually easier, see below.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427609926
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427639138


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]

2023-12-14 Thread Thomas Stuefe
On Tue, 12 Dec 2023 14:05:48 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   followed the proposals

Is this libpath parsing code copied from the R3 kernel? If yes, pls make sure 
there are no licensing issues.

src/hotspot/os/aix/os_aix.cpp line 206:

> 204: constexpr int max_handletable = 1024;
> 205: static int g_handletable_used = 0;
> 206: static struct handletableentry g_handletable[max_handletable] = {{0, 0, 
> 0, 0}};

I would move all that new and clearly delineated dlopen stuff into an own file, 
e.g. dlopen_aix.cpp or porting_aix.cpp (in porting_aix.cpp, we already have 
wrappers for other functions). os_aix.cpp is already massive.

src/hotspot/os/aix/os_aix.cpp line 1129:

> 1127: 
> 1128: // get the library search path burned in to the executable file during 
> linking
> 1129: // If the libpath cannot be retrieved return an empty path

This is new. Is this complexity needed, if yes, why? Don't see a comment, may 
have missed it.

src/hotspot/os/aix/os_aix.cpp line 1131:

> 1129: // If the libpath cannot be retrieved return an empty path
> 1130: static const char* rtv_linkedin_libpath() {
> 1131:   static char buffer[4096];

This coding has some issues:

- a generic char buffer is not a good idea. Forces you to do casts all over the 
place, and introduces alignment issues with unaligned char buffer. Which I 
assume is the reason for all the separate memcpy-into-structures below. I would 
just read into the structures directly.
- you need to check the return codes for fread to make sure you read the number 
of bytes expected, lest you work with uninitialized memory and maybe to handle 
sporadic EINTR.
- I don't get all the separate "SZ" macros. They must be equal to 
sizeof(structure), right, otherwise you get buffer overruns or work with 
uninitialized memory?

Proposal: add a local wrapper function like this:


template 
static bool my_checked_fread(FILE* f, T* out) {
  // read sizeof(T) from f. 
  // Check return code. 
  // Return bool if sizeof(T) bytes were read.
  e.g. in a very trivial form:
  int bytesread = fread(out, sizeof(T), 1, f);
  return bytesread == sizeof(T);
}


and use it in your code like this:


struct xcoff64 the_xcoff64;
struct scn64 the_scn64;
struct ldr64 the_ldr64;

if (!my_checked_fread(f, _xcoff64)) {
   assert?
}
...

if (!my_checked_fread(f, _ldr64) { 
  .. handle error  
}

src/hotspot/os/aix/os_aix.cpp line 1132:

> 1130: static const char* rtv_linkedin_libpath() {
> 1131:   static char buffer[4096];
> 1132:   static const char* libpath = 0;

If your intent is to return an empty buffer if there is no contained libpath, I 
would just:


static const char* libpath = "";

then you can always just return libpath.

src/hotspot/os/aix/os_aix.cpp line 1135:

> 1133: 
> 1134:   if (libpath)
> 1135: return libpath;

{ }

src/hotspot/os/aix/os_aix.cpp line 1137:

> 1135: return libpath;
> 1136: 
> 1137:   char pgmpath[32+1];

Will overflow if pid_t is 64bit. Give it a larger size; after all, you are 
giving buffer 4K above, so you are not overly concerned with saving stack space.

src/hotspot/os/aix/os_aix.cpp line 1146:

> 1144:   fread(buffer, 1, FILHSZ_64 + _AOUTHSZ_EXEC_64, f);
> 1145: 
> 1146:   if (((struct filehdr*)buffer)->f_magic == U802TOCMAGIC ) {

as stated above, I don't think this section is needed.

src/hotspot/os/aix/os_aix.cpp line 1170:

> 

Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]

2023-12-05 Thread Thomas Stuefe
On Tue, 5 Dec 2023 13:21:35 GMT, Thomas Stuefe  wrote:

>> Joachim Kern has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   encapsulate everything in os::Aix::dlopen
>
> src/hotspot/os/aix/os_aix.cpp line 3133:
> 
>> 3131: return nullptr;
>> 3132:   }
>> 3133:   // library not still loaded and still place in array, so load 
>> library
> 
> s/still/yet

No need to be this verbose either, especially since the comment is somewhat 
misleading. "create entry at end of table" implies that we have a dynamically 
growing table and allocate new entries. Proposal: "Library not yet loaded; load 
it, then store its handle in handle table".

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415605856


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v2]

2023-12-05 Thread Thomas Stuefe
On Mon, 4 Dec 2023 12:33:26 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   improve handling of nonexisting files

src/hotspot/os/aix/os_aix.cpp line 203:

> 201: constexpr int max_handletable = 1024;
> 202: static int g_handletable_used = 0;
> 203: static struct handletableentry g_handletable[max_handletable] = 
> {{0,0,0,0}};

style nits:
- we usually write the * behind type, not before var name
- `{{0,0}}` -> insert spaces

src/hotspot/os/aix/os_aix.cpp line 1159:

> 1157: result = ::dlopen(filename, dflags);
> 1158: if (result != nullptr) {
> 1159:   assert(false, "dll_load: Could not stat() file %s, but dlopen() 
> worked; Have to improve stat()", filename);

use assert(result != nullptr) and remove condition

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1413843503
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1413846111


Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]

2023-12-05 Thread Thomas Stuefe
On Tue, 5 Dec 2023 12:11:46 GMT, Joachim Kern  wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   encapsulate everything in os::Aix::dlopen

Excellent, this is how I have pictured a good solution. Very nice.

A number of remarks, but nothing fundamental.

src/hotspot/os/aix/os_aix.cpp line 1137:

> 1135: if (ebuf != nullptr && ebuflen > 0) {
> 1136:   ::strncpy(ebuf, "dll_load: empty filename specified", ebuflen - 
> 1);
> 1137: }

Are there any cases where we don't hand in the error buffer? If so, I would 
just assert ebuf and ebuflen. No need for this kind of flexibility.

src/hotspot/os/aix/os_aix.cpp line 3051:

> 3049: 
> 3050: // Simulate the library search algorithm of dlopen() (in os::dll_load)
> 3051: int os::Aix::stat64x_via_LIBPATH(const char* path, struct stat64x* 
> stat) {

- no need to export this, make it filescope static
- please return bool, with false = error
- please rename it to something like "search_file_in_LIBPATH"

src/hotspot/os/aix/os_aix.cpp line 3055:

> 3053: return -1;
> 3054: 
> 3055:   char *path2 = strdup (path);

Please use os::strdup and os::free. If you really intent to use the plain libc 
versions, use `::strdup` and `::free` to make sure - and indicate to code 
readers - you use the global libc variants.

src/hotspot/os/aix/os_aix.cpp line 3059:

> 3057:   int idx = strlen(path2) - 1;
> 3058:   if (path2[idx] == ')') {
> 3059: while (path2[idx] != '(' && idx > 0) idx--;

? Why not `strrchr()`?

src/hotspot/os/aix/os_aix.cpp line 3067:

> 3065:   if (path2[0] == '/' ||
> 3066:   (path2[0] == '.' && (path2[1] == '/' ||
> 3067:   (path2[1] == '.' && path2[2] == '/' {

This complexity is not needed, nor is it sufficient, since it does not handle 
relative paths ("mydirectory/hallo.so")

https://www.ibm.com/docs/en/aix/7.1?topic=d-dlopen-subroutine

"If FilePath contains a slash character, FilePath is used directly, and no 
directories are searched. "

So, just scan for a '/' - if you find one, its a path to be opened directly:


const bool use_as_filepath = strchr(path2, '/');

src/hotspot/os/aix/os_aix.cpp line 3085:

> 3083:   strcpy(libpath, env);
> 3084:   for (token = strtok_r(libpath, ":", ); token != nullptr; 
> token = strtok_r(nullptr, ":", )) {
> 3085: sprintf(combined, "%s/%s", token, path2);

You can save a lot of pain and manual labor by using `stringStream` here. 


stringStream combined;
combined.print("%s/%s", token, path2);
const char* combined_path_string = combined.base();

no need for manual allocation and byte counting.

src/hotspot/os/aix/os_aix.cpp line 3099:

> 3097: // filled by os::dll_load(). This way we mimic dl handle equality for a 
> library
> 3098: // opened a second time, as it is implemented on other platforms.
> 3099: void* os::Aix::dlopen(const char* filename, int Flags) {

Does not need to be exported, nor does os::AIX::dlclose. Make file scope 
static. See my remarks in os_posix.cpp.

src/hotspot/os/aix/os_aix.cpp line 3103:

> 3101:   struct stat64x libstat;
> 3102: 
> 3103:   if (os::Aix::stat64x_via_LIBPATH(filename, )) {

Please return bool, not unix int -1, this hurts my brain :-)

src/hotspot/os/aix/os_aix.cpp line 3108:

> 3106: if (result != nullptr) {
> 3107:   assert(false, "dll_load: Could not stat() file %s, but dlopen() 
> worked; Have to improve stat()", filename);
> 3108:  

Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11]

2023-12-04 Thread Thomas Stuefe
On Mon, 4 Dec 2023 00:41:23 GMT, David Holmes  wrote:

> From the blog:
> 
> > Yes! The methods are being deallocated for a class loader that is still 
> > alive.
> 
> Okay so why does this happen and is it a reasonable thing to be happening? On 
> the surface it sounds wrong to deallocate anything associated with a live 
> classloader.

This sounds odd to me to. I know that we deallocate the old *byte code* of 
re-transformed classes; but `Method*` ?

-

PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1838413238


Re: RFR: 8320935: Move CDS config initialization code to cdsConfig.cpp [v2]

2023-12-01 Thread Thomas Stuefe
On Sat, 2 Dec 2023 00:38:58 GMT, Ioi Lam  wrote:

>> This is a simple clean up that moves the code for initializing the CDS 
>> config states from arguments.cpp to cdsConfig.cpp
>> 
>> I renamed a few functions, but otherwise the code is unchanged.
>> 
>> - `get_default_shared_archive_path()` -> `default_archive_path()`
>> - `GetSharedArchivePath()` -> `static_archive_path()`
>> - `GetSharedDynamicArchivePath()` -> `dynamic_archive_path()`
>> 
>> There's also less `#if INCLUDE_CDS` since the entire cdsConfig.cpp file is 
>> compiled only if CDS is enabled.
>
> Ioi Lam has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   fixed indentation

Looks good. Did not find any functional difference to the original code.

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16868#pullrequestreview-1760796058


Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11]

2023-11-30 Thread Thomas Stuefe
On Wed, 29 Nov 2023 11:49:31 GMT, Jaroslav Bachorik  
wrote:

>> Please, review this fix for a corner case handling of `jmethodID` values.
>> 
>> The issue is related to the interplay between `jmethodID` values and method 
>> redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` 
>> instance. Once that method gets redefined, the `jmethodID` is updated to 
>> point to the last `Method` version. 
>> Unless the method is still on stack/running, in which case the original 
>> `jmethodID` will be redirected to the latest `Method` version and at the 
>> same time the 'previous' `Method` version will receive a new `jmethodID` 
>> pointing to that previous version.
>> 
>> If we happen to capture stacktrace via `GetStackTrace` or 
>> `GetAllStackTraces` JVMTI calls while this previous `Method` version is 
>> still on stack we will have the corresponding frame identified by a 
>> `jmethodID` pointing to that version.
>> However, sooner or later the 'previous' class version becomes eligible for 
>> cleanup at what time all contained `Method` instances. The cleanup process 
>> will not perform the `jmethodID` pointer maintenance and we will end up with 
>> pointers to deallocated memory. 
>> This is caused by the fact that the `jmethodID` lifecycle is bound to 
>> `ClassLoaderData` instance and all relevant `jmethodID`s will get 
>> batch-updated when the class loader is being released and all its classes 
>> are getting unloaded. 
>> 
>> This means that we need to make sure that if a `Method` instance is being 
>> deallocate the associated `jmethodID` (if any) must not point to the 
>> deallocated instance once we are finished. Unfortunately, we can not just 
>> update the `jmethodID` values in bulk when purging an old class version - 
>> the per `InstanceKlass` jmethodID cache is present only for the main class 
>> version and contains `jmethodID` values for both the old and current method 
>> versions. 
>> 
>> ~Therefore we need to perform `jmethodID` lookup when we are about to 
>> deallocate a `Method` instance and clean up the pointer only if that 
>> `jmethodID` is pointing to the `Method` instance which is being deallocated.~
>> 
>> Therefore, we need to perform `jmethodID` lookup for each method in an old 
>> class version that is getting purged, and null out the pointer of that 
>> `jmethodID` to break the link from `jmethodID` to the method instance that 
>> is about to get deallocated.
>> 
>> _(For anyone interested, a much lengthier writeup is available in [my 
>> blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_
>
> Jaroslav Bachorik has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Restrict cleanup to obsolete methods only

I won't be able to review this this week, too snowed in atm. I can take a look 
next week. We can always just revert the change if needed.

Thinking about Skara, I think as long as we have this confusing mixture of 
rules (hotspot wants 2 reviewers that are Reviewer/Committer, but some jdk libs 
only want one, but then you need two for desktop I think otherwise Phil gets 
angry) - we should hard-code the 2-reviewer rule into skara as default since it 
affects the lion's share of all changes.

-

PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1835474161


Re: RFR: 8320921: GHA: Parallelize hotspot_compiler test jobs

2023-11-29 Thread Thomas Stuefe
On Tue, 28 Nov 2023 18:15:56 GMT, Aleksey Shipilev  wrote:

> In current GHA, `hotspot_compiler` testing takes a long time, and often takes 
> the longest. On MacOS and Windows it routinely takes 60..80 minutes, while 
> other test groups run in 30..40 minutes. This often drags the total wall 
> clock time to run the GHA up. Fortunately, there are already several parts in 
> `hotspot_compiler` test group, which we might just run separately:
> https://github.com/openjdk/jdk/blob/dc256fbc6490f8163adb286dbb7380c10e5e1e06/test/hotspot/jtreg/TEST.groups#L142-L146
> 
> There is a bit of awkwardness about `not_xcomp` subgroup, which does not have 
> lots of tests. Instead of doing another short job, I think we can just meld 
> it into the quickest "part 2".
> 
> 
> # Before
> linux-x64 / test (hs/tier1 compiler) 38m 27s  
> windows-x64 / test (hs/tier1 compiler): 1h 17m 30s
> macos-x64 / test (hs/tier1 compiler): 1h 2m 53s 
> 
> # After
> linux-x64 / test (hs/tier1 compiler part 1): 11m 38s  
> linux-x64 / test (hs/tier1 compiler part 2): 9m 34s   
> linux-x64 / test (hs/tier1 compiler part 3): 21m 40s  
> 
> windows-x64 / test (hs/tier1 compiler part 1): 24m 19s
> windows-x64 / test (hs/tier1 compiler part 2): 18m 54s
> windows-x64 / test (hs/tier1 compiler part 3): 43m 32s
> 
> macos-x64 / test (hs/tier1 compiler part 1): 26m 44s  
> macos-x64 / test (hs/tier1 compiler part 2): 20m 40s  
> macos-x64 / test (hs/tier1 compiler part 3): 42m 10s
> 
> 
> Additional testing:
>  - [x] GHA (current run is in progress, but I had clean runs before)

Good!

-

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16860#pullrequestreview-1755113396


Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]

2023-11-28 Thread Thomas Stuefe
On Tue, 28 Nov 2023 11:27:33 GMT, suchismith1993  wrote:

> > 
> 
> @tstuefe 3rd parameter to pass the either of 2 things:
> 
> 1. The JvmTiAgent object "agent", so that after shifting the 
> save_library_signature to os_aix,we can still access the agent object.-> For 
> this i tried importing jvm/prims header file, but i get segmentation faults 
> during build . Not sure if i am doing it the right way.
> 
> 2. Pass a character buffer(and not const char*) where we copy the 
> modified filename back to it and then use it in jvmAgent. code.

Does not sound really appealing tbh. We pile one hack atop of another.

Please synchronize with @JoKern65 at SAP. He will rewrite the JVMTI handler 
code, which will make this point moot. See 
https://bugs.openjdk.org/browse/JDK-8320890.

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829776298


Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]

2023-11-28 Thread Thomas Stuefe
On Mon, 27 Nov 2023 15:44:58 GMT, suchismith1993  wrote:

> > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if 
> > > > the condition fails for .so files, because i have to reload it again 
> > > > and check if the .a exists. In the shared code i had repeat less number 
> > > > of lines i believe. Do you suggest moving lines 1132 to 1139 to another 
> > > > function then ?
> > > 
> > > 
> > > @tstuefe Any suggestion on this ?
> > 
> > 
> > ```
> > --- a/src/hotspot/os/aix/os_aix.cpp
> > +++ b/src/hotspot/os/aix/os_aix.cpp
> > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, 
> > char* buf,
> >return true;
> >  }
> >  
> > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) {
> > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) 
> > {
> >  
> >log_info(os)("attempting shared library load of %s", filename);
> >  
> > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, 
> > int ebuflen) {
> >return nullptr;
> >  }
> >  
> > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) {
> > +
> > +  void* result = nullptr;
> > +
> > +  // First try using *.so suffix; failing that, retry with *.a suffix.
> > +  const size_t len = strlen(filename);
> > +  constexpr size_t safety = 3 + 1;
> > +  constexpr size_t bufsize = len + safety;
> > +  char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal);
> > +  strcpy(buf, filename);
> > +  char* const dot = strrchr(buf, '.');
> > +
> > +  assert(dot != nullptr, "Attempting to load a shared object without 
> > extension? %s", filename);
> > +  assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0,
> > +  "Attempting to load a shared object that is neither *.so nor *.a", 
> > filename);
> > +
> > +  sprintf(dot, ".so");
> > +  result = dll_load_inner(buf, ebuf, ebuflen);
> > +
> > +  if (result == nullptr) {
> > +sprintf(dot, ".a");
> > +result = dll_load_inner(buf, ebuf, ebuflen);
> > +  }
> > +
> > +  FREE_C_HEAP_ARRAY(char, buf);
> > +
> > +  return result;
> > +}
> > +
> > ```
> 
> @tstuefe as discussed with @TheRealMDoerr do you think using default argument 
> will help ? Either we pass agent object as 3rd parameter or an empty 
> character buffer(and not const chat*) which would be spcifically used to copy 
> the alternate filename to it using strcpy so that it is reflected in the 
> jvmagent code ?

A third parameter for what?

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829602040


Re: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5]

2023-11-27 Thread Thomas Stuefe
On Mon, 27 Nov 2023 13:23:42 GMT, Thomas Stuefe  wrote:

>> Joachim Kern has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   adopt types
>
> This now causes problems with 
> 
> https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214
> 
> since it removes the possibility of silently alternating the file path inside 
> os::dll_load, which would be the preferred way for AIX to handle *.a shared 
> objects. So this causes even more ifdef AIX to bloom up everywhere.
> 
> A much better solution would have been to mimic stable-handle behavior inside 
> the AIX version of `os::dll_load`. 
> 
> Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple 
> mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) 
> tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp 
> without any changes to shared coding, and would have provided complete 
> coverage for all users of dll_load.

> @tstuefe: Hi Thomas, I'm not sure if I got it. We can make (inode, devid) to 
> a hash, which replaces the dlhandle on return of os::dlload. This hash would 
> of course be the same if the same library is loaded twice. But I do not know 
> how to handle the two real dlhandles.

Why do you need two dlhandles? 

A handle returned from dlopen should be valid for the whole process. If you 
cache that in a hashmap, and for the second caller of os::dll_load() return the 
cached variant, this should work, no?

-

PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1828019003


Re: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5]

2023-11-27 Thread Thomas Stuefe
On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern  wrote:

>> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , 
>> the following test started to fail on AIX :
>> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java;
>> The problem was described in 
>> [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try 
>> of a fix.
>> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) 
>> was necessary.
>> Both fixes just disable the specific subtest on AIX, without correction of 
>> the root cause.
>> The root cause is, that dlopen() on AIX returns different handles every 
>> time, even if you load a library twice. There is no official AIX API 
>> available to get this information on a different way.
>> My proposal is, to use the stat64x API with the fields st_device and 
>> st_inode. After a dlopen() the stat64x() API is called additionally to get 
>> this information which is then stored parallel to the library handle in the 
>> jvmtiAgent. For AIX we then can compare these values instead of the library 
>> handle and get the same functionality as on linux.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   adopt types

This now causes problems with 

https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214

since it removes the possibility of silently alternating the file path inside 
os::dll_load, which would be the preferred way for AIX to handle *.a shared 
objects. So this causes even more ifdef AIX to bloom up everywhere.

A much better solution would have been to mimic stable-handle behavior inside 
the AIX version of `os::dll_load`. 

Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple 
mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) 
tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp without 
any changes to shared coding, and would have provided complete coverage for all 
users of dll_load.

-

PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1827826924


Re: RFR: 8320699: Add parameter to skip progress logging of OutputAnalyzer

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 10:34:02 GMT, Stefan Karlsson  wrote:

> Tests using ProcessTools.executeProcess gets the following output written to 
> stdout:
> [2023-11-24T09:58:16.797540608Z] Gathering output for process 2517117
> [2023-11-24T09:58:23.070781345Z] Waiting for completion for process 2517117
> [2023-11-24T09:58:23.071045055Z] Waiting for completion finished for process 
> 2517117
> 
> This might be good for some use cases and debugging, but some tests spawns a 
> large number of processes and for those this output fills up the log files.
> 
> I propose that we add a way to turn of this output for tests where we find 
> this output to be too noisy.

I'm fine with this, but would prefer a global switch instead of hard-coding the 
value per test and handing it down.

-

PR Comment: https://git.openjdk.org/jdk/pull/16807#issuecomment-1825910975


Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 11:45:25 GMT, suchismith1993  wrote:

> > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the 
> > condition fails for .so files, because i have to reload it again and check 
> > if the .a exists. In the shared code i had repeat less number of lines i 
> > believe. Do you suggest moving lines 1132 to 1139 to another function then ?
> 
> @tstuefe Any suggestion on this ?


--- a/src/hotspot/os/aix/os_aix.cpp
+++ b/src/hotspot/os/aix/os_aix.cpp
@@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* 
buf,
   return true;
 }
 
-void *os::dll_load(const char *filename, char *ebuf, int ebuflen) {
+static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) {
 
   log_info(os)("attempting shared library load of %s", filename);
 
@@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int 
ebuflen) {
   return nullptr;
 }
 
+void* os::dll_load(const char *filename, char *ebuf, int ebuflen) {
+
+  void* result = nullptr;
+
+  // First try using *.so suffix; failing that, retry with *.a suffix.
+  const size_t len = strlen(filename);
+  constexpr size_t safety = 3 + 1;
+  constexpr size_t bufsize = len + safety;
+  char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal);
+  strcpy(buf, filename);
+  char* const dot = strrchr(buf, '.');
+
+  assert(dot != nullptr, "Attempting to load a shared object without 
extension? %s", filename);
+  assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0,
+  "Attempting to load a shared object that is neither *.so nor *.a", 
filename);
+
+  sprintf(dot, ".so");
+  result = dll_load_inner(buf, ebuf, ebuflen);
+
+  if (result == nullptr) {
+sprintf(dot, ".a");
+result = dll_load_inner(buf, ebuf, ebuflen);
+  }
+
+  FREE_C_HEAP_ARRAY(char, buf);
+
+  return result;
+}
+

-

PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1825721906


Re: RFR: 8320691: Timeout handler on Windows takes 2 hours to complete [v2]

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 09:51:07 GMT, Daniel Jeliński  wrote:

>> test/failure_handler/src/share/conf/windows.properties line 60:
>> 
>>> 58: 
>>> 59: native.core.app=cdb
>>> 60: native.core.args=-c ".dump /ma core.%p;qd" -p %p
>> 
>> Just to double check that this is the right option. `/ma` terminates if 
>> there is "inaccessable memory" where `/mA` seems to be continue.  The `/f` 
>> option says all "accessible" memory which I assume it means skips/ignored 
>> inaccessible, is that right?
>
> `/ma` was the option suggested by `.dump`; looking at the documentation, 
> `/mA` indeed seems to be a better choice. I'll update.
> 
> `/f` used to create a [full user-mode 
> dump](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/user-mode-dump-files#full-user-mode-dumps).
>  I couldn't find any information about the behavior on inaccessible memory.
> None of the options specify dumping inaccessible memory; I assume the 
> failures only happen if some memory becomes inaccessible while producing the 
> dump.

Oh, I overlooked this. I think that is right, the old /f minidump generation 
worked around inaccessible memory. They were just "holes".

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16806#discussion_r1404160596


Re: RFR: 8320691: Timeout handler on Windows takes 2 hours to complete [v2]

2023-11-24 Thread Thomas Stuefe
On Fri, 24 Nov 2023 09:51:07 GMT, Daniel Jeliński  wrote:

>> test/failure_handler/src/share/conf/windows.properties line 60:
>> 
>>> 58: 
>>> 59: native.core.app=cdb
>>> 60: native.core.args=-c ".dump /ma core.%p;qd" -p %p
>> 
>> Just to double check that this is the right option. `/ma` terminates if 
>> there is "inaccessable memory" where `/mA` seems to be continue.  The `/f` 
>> option says all "accessible" memory which I assume it means skips/ignored 
>> inaccessible, is that right?
>
> `/ma` was the option suggested by `.dump`; looking at the documentation, 
> `/mA` indeed seems to be a better choice. I'll update.
> 
> `/f` used to create a [full user-mode 
> dump](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/user-mode-dump-files#full-user-mode-dumps).
>  I couldn't find any information about the behavior on inaccessible memory.
> None of the options specify dumping inaccessible memory; I assume the 
> failures only happen if some memory becomes inaccessible while producing the 
> dump.

Oh, I overlooked this. I think that is right, the old /f minidump generation 
worked around inaccessible memory. They were just "holes".

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16806#discussion_r1404160596


  1   2   3   4   5   6   7   8   9   10   >