Re: RFR: 8332400: isspace argument should be a valid unsigned char [v2]
On Tue, 11 Jun 2024 18:07:10 GMT, Robert Toyonaga wrote: >> ### Summary >> This change ensures we don't get undefined behavior when >> calling[`isspace`](https://pubs.opengroup.org/onlinepubs/007904975/functions/isspace.html). >> `isspace` accepts an `int` argument that "the application shall ensure is >> a character representable as an unsigned char or equal to the value of the >> macro EOF.". >> >> Previously, there was no checking of the values passed to `isspace`. I've >> replaced direct calls with a new wrapper `os::is_space` that performs a >> range check and prevents the possibility of undefined behavior from >> happening. For instances outside of Hotspot, I've added casts to `unsigned >> char`. >> >> **Testing** >> - Added a new test in `test/hotspot/gtest/runtime/test_os.cpp` to check >> `os::is_space` is working correctly. >> - tier1 > > Robert Toyonaga has updated the pull request incrementally with one > additional commit since the last revision: > > Replace wrapper with casts. Yes, unfortunate, but ok. Thank you for doing this. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19567#pullrequestreview-2112563225
Re: RFR: 8330846: Add stacks of mounted virtual threads to the HotSpot thread dump [v8]
On Tue, 4 Jun 2024 13:46:16 GMT, Inigo Mediavilla Saiz wrote: >> src/hotspot/share/runtime/threads.cpp line 1336: >> >>> 1334: oop vt = p->vthread(); >>> 1335: assert(vt != nullptr, ""); >>> 1336: st->print_cr(" \tMounted virtual thread #" >>> INT64_FORMAT, (int64_t)java_lang_Thread::thread_id(vt)); >> >> Please no manual indentation, see remarks above. > > I'm leaving minimal indentation and I will update the code on a new PR to > rely on your changes if it's OK for you @tstuefe Okay, that works. - PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1632931400
Integrated: 8322811: jcmd System.dump_map help info has conflicting statements
On Fri, 7 Jun 2024 10:40:07 GMT, Thomas Stuefe wrote: > @dholmes-ora this is one of yours. > > This was a tad annoying to fix (fix is simple though), since the jcmd > framework has no good way to allow for default parameters that are not used > literally. E.g. in this case, the real value for the file name will contain > the process pid, which of course cannot be hard-coded. > > New output: > > > Syntax : System.dump_map [options] > > Options: (options must be specified using the or = syntax) > -H : [optional] Human readable format (BOOLEAN, false) > -F : [optional] file path (STRING, vm_memory_map_.txt) This pull request has now been integrated. Changeset: 83b34410 Author:Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/83b34410e326c47f357a37c3a337b7dedb8cbbda Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod 8322811: jcmd System.dump_map help info has conflicting statements Reviewed-by: dholmes, kevinw - PR: https://git.openjdk.org/jdk/pull/19596
Re: RFR: 8322811: jcmd System.dump_map help info has conflicting statements
On Mon, 10 Jun 2024 08:35:06 GMT, Kevin Walls wrote: >> @dholmes-ora this is one of yours. >> >> This was a tad annoying to fix (fix is simple though), since the jcmd >> framework has no good way to allow for default parameters that are not used >> literally. E.g. in this case, the real value for the file name will contain >> the process pid, which of course cannot be hard-coded. >> >> New output: >> >> >> Syntax : System.dump_map [options] >> >> Options: (options must be specified using the or = syntax) >> -H : [optional] Human readable format (BOOLEAN, false) >> -F : [optional] file path (STRING, vm_memory_map_.txt) > > Looks good to me, yes non-literal defaults are a bit awkward here. Thanks @kevinjwalls, @dholmes-ora ! - PR Comment: https://git.openjdk.org/jdk/pull/19596#issuecomment-2157761743
Re: RFR: 8332400: isspace argument should be a valid unsigned char
On Mon, 10 Jun 2024 08:20:38 GMT, David Holmes wrote: > > "To use these functions safely with plain chars (or signed chars), the > > argument should first be converted to unsigned char" > > @tstuefe wow! Okay. That is a surprise to me. A cast to unsigned char doesn't > actually do anything. Every char is "representable" as an unsigned char > because it holds a bit pattern between 0x00 and 0xff i.e. the function is > well defined if the incoming int is either EOF (int -1) or else in the range > 0x00 to 0xff. But I did a bit of searching and it seems it comes down to > potential arithmetic operations on the "char" the might behave differently > depending on the signed-ness. :( I was surprised as well. Turns out you can use something for 20+ years and not notice :) - PR Comment: https://git.openjdk.org/jdk/pull/19567#issuecomment-2157711653
RFR: 8322811: jcmd System.dump_map help info has conflicting statements
@dholmes-ora this is one of yours. This was a tad annoying to fix (fix is simple though), since the jcmd framework has no good way to allow for default parameters that are not used literally. E.g. in this case, the real value for the file name will contain the process pid, which of course cannot be hard-coded. New output: Syntax : System.dump_map [options] Options: (options must be specified using the or = syntax) -H : [optional] Human readable format (BOOLEAN, false) -F : [optional] file path (STRING, vm_memory_map_.txt) - Commit messages: - JDK-8322811-jcmd-System-dump_map-help-info-has-conflicting-statements Changes: https://git.openjdk.org/jdk/pull/19596/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19596=00 Issue: https://bugs.openjdk.org/browse/JDK-8322811 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19596/head:pull/19596 PR: https://git.openjdk.org/jdk/pull/19596
Re: RFR: 8332400: isspace argument should be a valid unsigned char
On Thu, 6 Jun 2024 21:21:23 GMT, David Holmes wrote: >> ### Summary >> This change ensures we don't get undefined behavior when >> calling[`isspace`](https://pubs.opengroup.org/onlinepubs/007904975/functions/isspace.html). >> `isspace` accepts an `int` argument that "the application shall ensure is >> a character representable as an unsigned char or equal to the value of the >> macro EOF.". >> >> Previously, there was no checking of the values passed to `isspace`. I've >> replaced direct calls with a new wrapper `os::is_space` that performs a >> range check and prevents the possibility of undefined behavior from >> happening. For instances outside of Hotspot, I've added casts to `unsigned >> char`. >> >> **Testing** >> - Added a new test in `test/hotspot/gtest/runtime/test_os.cpp` to check >> `os::is_space` is working correctly. >> - tier1 > > Sorry I think this is well-intentioned but unnecessary in nearly all cases. > If you pass a char there is no potential problem. Only passing an actual int > could be a problem. @dholmes-ora > Sorry I think this is well-intentioned but unnecessary in nearly all cases. > If you pass a char there is no potential problem. Only passing an actual int > could be a problem. Note that the issue was motivated by an Oracle engineer complaining about me using isspace on a char. That caused me to look up its behavior. Recently, we seem intent on eliminating UB, so why not. That said, I agree that we probably don't need the wrapper. And casting to int feels awkward. I propose - input from trusted sources, e.g. proc fs, don't need casting - input from other sources should be casted to unsigned char (see recommendation here: https://en.cppreference.com/w/cpp/string/byte/isspace "To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char") - PR Comment: https://git.openjdk.org/jdk/pull/19567#issuecomment-2154146793
Re: RFR: 8330846: Add stacks of mounted virtual threads to the HotSpot thread dump [v7]
On Tue, 4 Jun 2024 13:23:59 GMT, Inigo Mediavilla Saiz wrote: >> Note that We are in the process of adding better and saner auto-indentation >> to outputStream. See https://github.com/openjdk/jdk/pull/19461 . I don't >> think that PR is going to take long. >> >> If you don't want to wait, please: >> - As David wrote, use spaces, not tabs >> - Today's pattern for using outputStream indentation is: >> - set up indentation, preferably with streamIndentor, not manually with >> inc/dec >> - then, before printing each line, call stream->indent() >> >> This pattern would also help us to later identify and remove this manual >> indentation pattern if auto-indent becomes a thing. >> >> But really, waiting for https://github.com/openjdk/jdk/pull/19461 would be >> preferable. Then, all you have to do is place a streamIndentor around stack >> printing. Sub-function printing is then indented automatically. > > Thanks ! > > Your PR looks very promising @tstuefe, I would indeed prefer to wait for your > changes as a way to add additional indentation to the stack of the virtual > thread. > > What do you think if I leave the current PR with the indentation that is > already used for the stack of the carrier thread and I create a separate PR > based on yours to add the additional indentation ? Okay for me, if other reviewers are okay with it. - PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1628042032
Re: RFR: 8330846: Add stacks of mounted virtual threads to the HotSpot thread dump [v7]
On Tue, 4 Jun 2024 05:30:36 GMT, David Holmes wrote: >> src/hotspot/share/runtime/javaThread.cpp line 1832: >> >>> 1830: st->print("\t"); >>> 1831: indentation--; >>> 1832: } >> >> Suggestion: >> >> while (indentation-- > 0) { >> st->print("\t"); >> } >> >> Though not sure using `\t` is the best way to indent this as stream >> indentation is based on spaces. > > Actually I'm not sure what this indentation actually does at this location > and its affect on other user's of this API. I would have expected the caller > to set up the necessary indentation in the stream that gets passed in. I see > you inc the indentation but then use the current indentation to insert > multiple tabs - which should not be necessary if the stream indentation works > correctly. ??? Note that We are in the process of adding better and saner auto-indentation to outputStream. See https://github.com/openjdk/jdk/pull/19461 . I don't think that PR is going to take long. If you don't want to wait, please: - As David wrote, use spaces, not tabs - Today's pattern for using outputStream indentation is: - set up indentation, preferably with streamIndentor, not manually with inc/dec - then, before printing each line, call stream->indent() This pattern would also help us to later identify and remove this manual indentation pattern if auto-indent becomes a thing. But really, waiting for https://github.com/openjdk/jdk/pull/19461 would be preferable. Then, all you have to do is place a streamIndentor around stack printing. Sub-function printing is then indented automatically. - PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1625622263
Re: RFR: 8330846: Add stacks of mounted virtual threads to the HotSpot thread dump [v8]
On Tue, 4 Jun 2024 07:25:41 GMT, Inigo Mediavilla Saiz wrote: >> Print the stack traces of mounted virtual threads when calling `jcmd >> Thread.print`. > > Inigo Mediavilla Saiz has updated the pull request incrementally with two > additional commits since the last revision: > > - Cleanup test > >- Stop virtualthread >- Remove unneeded imports >- Remove modules that are not needed > - Fix copyright year src/hotspot/share/runtime/threads.cpp line 1334: > 1332: if (thread_oop != nullptr) { > 1333: if (p->is_vthread_mounted()) { > 1334: oop vt = p->vthread(); const src/hotspot/share/runtime/threads.cpp line 1335: > 1333: if (p->is_vthread_mounted()) { > 1334: oop vt = p->vthread(); > 1335: assert(vt != nullptr, ""); Please provide a valid assert string. src/hotspot/share/runtime/threads.cpp line 1336: > 1334: oop vt = p->vthread(); > 1335: assert(vt != nullptr, ""); > 1336: st->print_cr(" \tMounted virtual thread #" INT64_FORMAT, > (int64_t)java_lang_Thread::thread_id(vt)); Please no manual indentation, see remarks above. src/hotspot/share/runtime/threads.cpp line 1337: > 1335: assert(vt != nullptr, ""); > 1336: st->print_cr(" \tMounted virtual thread #" INT64_FORMAT, > (int64_t)java_lang_Thread::thread_id(vt)); > 1337: st->inc(); Use streamIndentor, please test/hotspot/jtreg/serviceability/dcmd/thread/PrintVirtualThreadTest.java line 52: > 50: output.shouldMatch(".*at " + > Pattern.quote(DummyRunnable.class.getName()) + "\\.run.*"); > 51: output.shouldMatch(".*at " + > Pattern.quote(DummyRunnable.class.getName()) + "\\.compute.*"); > 52: shouldFinish.compareAndSet(false, true); Why not just set? - PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1625622959 PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1625623415 PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1625624052 PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1625622718 PR Review Comment: https://git.openjdk.org/jdk/pull/19482#discussion_r1625634146
Re: RFR: 8330846: Add stacks of mounted virtual threads to the HotSpot thread dump [v4]
On Mon, 3 Jun 2024 08:30:15 GMT, Inigo Mediavilla Saiz wrote: >> Print the stack traces of mounted virtual threads when calling `jcmd >> Thread.print`. > > Inigo Mediavilla Saiz has updated the pull request incrementally with one > additional commit since the last revision: > > Add missing header I also find the duplication of the stack printing code unfortunate. It would be nice to reuse`JavaThread::print_vthread_stack_on`. I don't understand why it cannot be const? - PR Comment: https://git.openjdk.org/jdk/pull/19482#issuecomment-2144599582
Integrated: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp
On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area > that is supposed to work even if the current thread is not attached yet and > there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) > ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based > allocation in the RA constructor, ensuring that only malloc is used. This is > because in the old days the ChunkPools had been allocated from C-Heap and > there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since > [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We > now create chunk pools as global objects, so they are live as soon as the > libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting > ResourceArea, at the very beginning of CreateJavaVM. I made sure that both > allocations and follow-up-chunk-allocation worked even this early in VM life. This pull request has now been integrated. Changeset: ba323b51 Author:Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ba323b515d8821895356507bdb1e94df0776dd5a Stats: 8 lines in 3 files changed: 0 ins; 3 del; 5 mod 8333047: Remove arena-size-workaround in jvmtiUtils.cpp Reviewed-by: jsjolen, sspitsyn - PR: https://git.openjdk.org/jdk/pull/19425
Re: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp
On Wed, 29 May 2024 07:42:01 GMT, Johan Sjölen wrote: >> In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area >> that is supposed to work even if the current thread is not attached yet and >> there is no associated Thread or the Thread has no valid ResourceArea. >> >> It contains a workaround: >> >> >> // lazily create the single threaded resource area >> // pick a size which is not a standard since the pools don't exist yet >> _single_threaded_resource_area = new (mtInternal) >> ResourceArea(Chunk::non_pool_size); >> >> >> It specifies a non-standard chunk size to circumvent the chunk-pool-based >> allocation in the RA constructor, ensuring that only malloc is used. This is >> because in the old days the ChunkPools had been allocated from C-Heap and >> there was a time window when no chunk pools were live yet. >> >> This is quirky and a bit ugly. It is also unnecessary since >> [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). >> We now create chunk pools as global objects, so they are live as soon as the >> libjvm C++ initialization ran. We can remove this workaround and the comment. >> >> --- >> >> Tests: GHAs. >> I also manually called this function, and allocated from the resulting >> ResourceArea, at the very beginning of CreateJavaVM. I made sure that both >> allocations and follow-up-chunk-allocation worked even this early in VM life. > > Today, the ChunkPools are allocated before main through static > initialization. That means that the ChunkPools exists when main starts > executing, so this is safe. Thanks @jdksjolen and @sspitsyn ! - PR Comment: https://git.openjdk.org/jdk/pull/19425#issuecomment-2141338912
Re: RFR: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive
On Wed, 29 May 2024 18:12:25 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) > replacing all naked uses for ```UseSharedSpaces``` with > ```CDSConfig::is_using_archive```. > > Testing: > - [x] Tier 1 with GHA. > > Thanks, > Sonia Looks good, minus the nit @dholmes-ora mentioned. Please make sure Copyrights are updated. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19463#pullrequestreview-2089995893
Re: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage
On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations > are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime > error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in > JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, > jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() > src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() > src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in > VMThread::evaluate_operation(VM_Operation*) > src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) > src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() > src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() > src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() > src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry > src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread > (/lib64/libpthread.so.0+0xa6e9) (BuildId: > 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) > (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime > error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in > JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, > jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() > src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... I agree with David on the 24hr thing. We want others to stick to that rule, then we should keep the rule ourselves. The rule takes the pressure out of monitoring the patch flow. But @TheRealMDoerr is right, the only logical way we can see a nullptr here is if there are no waiters/notifiers. A better solution may have been to move the memsets into their respective count > 0 conditions. - PR Comment: https://git.openjdk.org/jdk/pull/19450#issuecomment-2140003043
Re: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp
On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area > that is supposed to work even if the current thread is not attached yet and > there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) > ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based > allocation in the RA constructor, ensuring that only malloc is used. This is > because in the old days the ChunkPools had been allocated from C-Heap and > there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since > [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We > now create chunk pools as global objects, so they are live as soon as the > libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting > ResourceArea, at the very beginning of CreateJavaVM. I made sure that both > allocations and follow-up-chunk-allocation worked even this early in VM life. @jdksjolen could you take a look? You know the Arena coding behind it, and this PR is, in a very circumvent way, one of the prerequisites for NMT simplifications I plan. - PR Comment: https://git.openjdk.org/jdk/pull/19425#issuecomment-2135835425
Re: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp
On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area > that is supposed to work even if the current thread is not attached yet and > there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) > ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based > allocation in the RA constructor, ensuring that only malloc is used. This is > because in the old days the ChunkPools had been allocated from C-Heap and > there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since > [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We > now create chunk pools as global objects, so they are live as soon as the > libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting > ResourceArea, at the very beginning of CreateJavaVM. I made sure that both > allocations and follow-up-chunk-allocation worked even this early in VM life. x86 problem unrelated - PR Comment: https://git.openjdk.org/jdk/pull/19425#issuecomment-2135680548
RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp
In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. It contains a workaround: // lazily create the single threaded resource area // pick a size which is not a standard since the pools don't exist yet _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. --- Tests: GHAs. I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. - Commit messages: - copyrights - remove non_pool_size - start Changes: https://git.openjdk.org/jdk/pull/19425/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19425=00 Issue: https://bugs.openjdk.org/browse/JDK-8333047 Stats: 8 lines in 3 files changed: 0 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19425/head:pull/19425 PR: https://git.openjdk.org/jdk/pull/19425
Integrated: 8332042: Move MEMFLAGS to its own include file
On Fri, 10 May 2024 09:06:08 GMT, Thomas Stuefe wrote: > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the > allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without > needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get > MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., > fixing all places where we rely on indirect includes), I added memflags.hpp > to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild > to aarch64, fastdebug minimal This pull request has now been integrated. Changeset: 95a60131 Author:Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/95a601316de06b4b0fbf6e3cbe5d2a1ca978 Stats: 201 lines in 25 files changed: 99 ins; 66 del; 36 mod 8332042: Move MEMFLAGS to its own include file Reviewed-by: jsjolen, stefank - PR: https://git.openjdk.org/jdk/pull/19172
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v3]
On Tue, 14 May 2024 07:19:32 GMT, Thomas Stuefe wrote: >> MEMFLAGS, as well as its enum constants, should live in its own include. >> >> The constants are used throughout the code base, often without needing the >> allocation APIs exposed through allocation.hpp. >> >> The MEMFLAGS enum def is often needed within NMT itself, again often without >> needing allocation.hpp. >> >> --- >> >> This patch moves the enum to its new file. >> >> It fixes those `allocation.hpp` includes that where only needed to get >> MEMFLAGS. It does not fix other includes. >> >> For backward compatibility, until we straightened out the dependencies >> (e.g., fixing all places where we rely on indirect includes), I added >> memflags.hpp to allocation.hpp. >> >> I tested (built) on: >> - MacOS aarch64, no precompiled headers, fastdebug >> - Linux x64, no precompiled headers, fastdebug, release, fastdebug >> crossbuild to aarch64, fastdebug minimal > > Thomas Stuefe has updated the pull request incrementally with one additional > commit since the last revision: > > Feedback StefanK Thanks @afshin-zafari @stefank @kimbarrett @jdksjolen - PR Comment: https://git.openjdk.org/jdk/pull/19172#issuecomment-2110464347
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v2]
On Mon, 13 May 2024 10:17:57 GMT, Stefan Karlsson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Update mallocLimit.hpp > > Changes requested by stefank (Reviewer). @stefank New version, hopefully addressed all your remarks. Thanks! > src/hotspot/share/nmt/memflags.cpp line 27: > >> 25: #include "precompiled.hpp" >> 26: >> 27: #include "nmt/memflags.hpp" > > There should be no blankline between precompiled.hpp and the rest of the > includes. I removed the file. - PR Comment: https://git.openjdk.org/jdk/pull/19172#issuecomment-2109454676 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1599498716
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v3]
> MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the > allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without > needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get > MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., > fixing all places where we rely on indirect includes), I added memflags.hpp > to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild > to aarch64, fastdebug minimal Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback StefanK - Changes: - all: https://git.openjdk.org/jdk/pull/19172/files - new: https://git.openjdk.org/jdk/pull/19172/files/42361558..2fc98923 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19172=02 - incr: https://webrevs.openjdk.org/?repo=jdk=19172=01-02 Stats: 41 lines in 4 files changed: 7 ins; 34 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19172/head:pull/19172 PR: https://git.openjdk.org/jdk/pull/19172
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v2]
On Mon, 13 May 2024 15:48:43 GMT, Stefan Karlsson wrote: >> I tend to agree with that. My earlier question still stands: is there a >> better place to put it? Right now the "enforced with code" in a stand-alone >> file doesn't tell me "why" this rule is important. > > If you want to keep the static_assert it in the .cpp file, then I won't block > that. > Could you instead put the static_assert near the code that will break? I like that. I will do that. - PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1599476559
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v2]
On Mon, 13 May 2024 14:47:18 GMT, Stefan Karlsson wrote: >> I don't feel like starting that particular bike shedding discussion :) But >> sure, sometime in the future we should do this. Here, I want it to be a >> simple renaming change. > > Right. That's why I prefixed this with "Open-ended comment/question", trying > to make it super clear that it wasn't intended as a request for this PR, but > rather a way to at least plant the seed of an idea that we might want to fix > this eyesore. I agree with you on the eyesore. MEMFLAGS does not follow any established convention, the implied plural is strange (its just one flag, not a set of), etc. We will change it sometime in the future. - PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1599478327
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v2]
On Mon, 13 May 2024 10:13:50 GMT, Stefan Karlsson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Update mallocLimit.hpp > > src/hotspot/share/nmt/memflags.cpp line 31: > >> 29: >> 30: // Extra insurance that MEMFLAGS truly has the same size as uint8_t. >> 31: STATIC_ASSERT(sizeof(MEMFLAGS) == sizeof(uint8_t)); > > I think you can remove this entire .cpp file. There's no need to check the > size of an enum with a specified base type. I rather have this explicit check. If MEMFLAGS>1byte, things break, and I would like to make that explicit. That said, I can move this static assert to the header. I just wanted to avoid including debug.hpp. My original intent was for this cpp file to be the place in the future for any MEMFLAGS related utility functions, e.g. to-and-from-string conversations. - PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598583814
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v2]
On Mon, 13 May 2024 10:16:36 GMT, Stefan Karlsson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Update mallocLimit.hpp > > src/hotspot/share/nmt/memflags.hpp line 30: > >> 28: #include "utilities/globalDefinitions.hpp" >> 29: >> 30: #define MEMORY_TYPES_DO(f) >> \ > > Open-ended comment/question: We call it MEMORY_TYPE and mt, but then we call > the type MEMFLAGS (with a completely non-standard UPPERCASE style). Maybe it > is time to rename MEMFLAGS? I don't feel like starting that particular bike shedding discussion :) But sure, sometime in the future we should do this. Here, I want it to be a simple renaming change. - PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598580049
Re: RFR: 8332042: Move MEMFLAGS to its own include file [v2]
> MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the > allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without > needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get > MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., > fixing all places where we rely on indirect includes), I added memflags.hpp > to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild > to aarch64, fastdebug minimal Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Update mallocLimit.hpp - Changes: - all: https://git.openjdk.org/jdk/pull/19172/files - new: https://git.openjdk.org/jdk/pull/19172/files/9a27048a..42361558 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19172=01 - incr: https://webrevs.openjdk.org/?repo=jdk=19172=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19172/head:pull/19172 PR: https://git.openjdk.org/jdk/pull/19172
Re: RFR: 8332042: Move MEMFLAGS to its own include file
On Fri, 10 May 2024 09:06:08 GMT, Thomas Stuefe wrote: > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the > allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without > needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get > MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., > fixing all places where we rely on indirect includes), I added memflags.hpp > to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild > to aarch64, fastdebug minimal Ping @afshin-zafari @jdksjolen @gerard-ziemski - PR Comment: https://git.openjdk.org/jdk/pull/19172#issuecomment-2104357678
RFR: 8332042: Move MEMFLAGS to its own include file
MEMFLAGS, as well as its enum constants, should live in its own include. The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. --- This patch moves the enum to its new file. It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. I tested (built) on: - MacOS aarch64, no precompiled headers, fastdebug - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal - Commit messages: - Update g1MonotonicArena.hpp - Update g1MonotonicArena.hpp - NMT-factor-out-memflags Changes: https://git.openjdk.org/jdk/pull/19172/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19172=00 Issue: https://bugs.openjdk.org/browse/JDK-8332042 Stats: 225 lines in 25 files changed: 124 ins; 64 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/19172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19172/head:pull/19172 PR: https://git.openjdk.org/jdk/pull/19172
Re: RFR: 8322043: HeapDumper should use parallel dump by default [v3]
On Wed, 17 Apr 2024 07:22:55 GMT, David Holmes wrote: >> I think it makes the code more flexible - it allows to distinguish between >> "use default value" and "I don't care" cases. > > I'm not sure it is a worthwhile distinction. Not passing an actual parameter > means "I don't care - take the default". I agree with @dholmes-ora. Let's keep things simple. - PR Review Comment: https://git.openjdk.org/jdk/pull/18748#discussion_r1568833440
Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]
On Sat, 13 Apr 2024 18:29:59 GMT, Thomas Stuefe wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains ten additional >> commits since the last revision: >> >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - jcheck fixes >> - Fix tests >> - Implement Metrics.isContainerized() >> - Some clean-up >> - Drop cgroups testing on plain Linux >> - Implement fall-back logic for non-ro controller mounts >> - Make find_ro static and local to compilation unit >> - 8261242: [Linux] OSContainer::is_containerized() returns true > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170: > >> 168: } >> 169: } >> 170: return false; > > An alternative, simpler, no need for modifying source string: > > static bool find_ro_opt(const char* o) { > return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,"); > } Please disregard my comment. Albeit longer, your version is clearer to read and more fault tolerant. > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: > >> 349: // >> 350: // We collect the read only mount option in the cgroup infos so as >> to have that >> 351: // info ready when determining is_containerized(). > > Here, and in other places: a comment indicating the line format we scan would > be appreciated, possibly with argument numbers. Saves the casual code reader > from looking into proc man page. Even just pasting the example line for proc > manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) > (but with order adapted to your scanf call, they count major:minor as one) Trying to parse the `%s%*[^-]-` So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- parses everything that is not a dash, until we encounter the dash? Then we eat the dash? This is to skip the optionals? - PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567754861 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567767209
Re: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2]
On Thu, 11 Apr 2024 12:08:02 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows >> it to figure out whether the JVM is actually running inside a container >> (`podman`, `docker`, `crio`), or with some other means that enforces >> memory/cpu limits by means of the cgroup filesystem. If neither of those >> conditions hold, the JVM runs in not containerized mode, addressing the >> issue described in the JBS tracker. For example, on my Linux system >> `is_containerized() == false" is being indicated with the following trace >> log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false >> because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new >> (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build >> 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build >> 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup >> controllers typically get mounted read only into containers. Note that the >> current container tests assert that `OSContainer::is_containerized() == >> true` in various tests. Therefore, using the heuristic of "is any memory or >> cpu limit present" isn't sufficient. I had considered that in an earlier >> iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of >> claiming a containerized system being present when it's actually just a >> regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 >> (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains ten additional > commits since the last revision: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - Fix tests > - Implement Metrics.isContainerized() > - Some clean-up > - Drop cgroups testing on plain Linux > - Implement fall-back logic for non-ro controller mounts > - Make find_ro static and local to compilation unit > - 8261242: [Linux] OSContainer::is_containerized() returns true I am not enough of a container expert to judge if the basic approach is right - I trust you on this. This is just a technical code review. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 170: > 168: } > 169: } > 170: return false; An alternative, simpler, no need for modifying source string: static bool find_ro_opt(const char* o) { return strcmp(o, "ro") || strstr(o, ",ro") || strstr(o, "ro,"); } src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: > 349: // > 350: // We collect the read only mount option in the cgroup infos so as > to have that > 351: // info ready when determining is_containerized(). Here, and in other places: a comment indicating the line format we scan would be appreciated, possibly with argument numbers. Saves the casual code reader from looking into proc man page. Even just pasting the example line for proc manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but with order adapted to your scanf call, they count major:minor as one) src/hotspot/os/linux/osContainer_linux.cpp line 78: > 76: const char *reason; > 77: bool any_mem_cpu_limit_present = false; > 78: bool ctrl_ro = cgroup_subsystem->is_containerized(); nit: naming? what does ctrl mean in this case? Maybe use "cgroup_is_containerized"? src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375: > 373: if (!c.isContainerized()) { > 374: ostream.println(INDENT + "System not containerized."); > 375: return; Why return here? Would this not cut the output short in the non-containerized case? And if this not intended, the not-containerized-`-XshowSettings:system` test below should test and catch this (e.g. scan for CPU set) - PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-1999328503 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1564182879 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567756663 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567774124 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1567779248
Re: RFR: 8330171: Lazy W^X switch implementation
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin wrote: > An alternative for preemptively switching the W^X thread mode on macOS with > an AArch64 CPU. This implementation triggers the switch in response to the > SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this > approach, it is now feasible to eliminate all WX guards and avoid potentially > costly operations. However, no significant improvement or degradation in > performance has been observed. Additionally, considering the issue with > AsyncGetCallTrace, the patched JVM has been successfully operated with > [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and > [async-profiler](https://github.com/async-profiler/async-profiler). > > Additional testing: > - [x] MacOS AArch64 server fastdebug *gtets* > - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4* > - [ ] Benchmarking > > @apangin and @parttimenerd could you please check the patch on your > scenarios?? I have one question, and I'm sorry if it has been answered before. How expensive is changing the mode? Is it just a status variable in user-space pthread lib? Or does it need a system call? In other words, how fine granular can we get without incurring too high a cost? - PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2053721713
Re: RFR: 8322043: HeapDumper should use parallel dump by default
On Fri, 12 Apr 2024 18:50:37 GMT, Alex Menkov wrote: > > I am curious: what is the memory overhead for parallel mode, and (I am not > > familiar with the logic) how many threads are involved? Is the number of > > thread bounded? > > I ask because, especially for the OnOOM handling, we may already be at a > > limit memory-wise. Starting to swap will probably be worse than running > > single-threaded. > > Good question. It think it's several MB per each additional thread (1MB > output buffer, DumperClassCacheTable - 1031 elements max, element size > depends on class field numbers, if HeapDumpGzipLevel is set, some buffers for > gzip compressors) Number of threads by default is min of > `os::initial_active_processor_count() * 3 / 8` and number of GC workers. For the OOM case, I would probably make it somehow dependent on os::free_memory() then. - PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2053517286
Re: RFR: 8322043: HeapDumper should use parallel dump by default
On Fri, 12 Apr 2024 02:17:34 GMT, Alex Menkov wrote: > The fix makes VM heap dumping parallel by default. > `jcmd GC.heap_dump` and `jmap -dump` had parallel dumping by default, the fix > affects `HotSpotDiagnosticMXBean.dumpHeap()`, `-XX:+HeapDumpBeforeFullGC`, > `-XX:+HeapDumpAfterFullGC` and `-XX:+HeapDumpOnOutOfMemoryError`. > > Testing: > - manually tested different heap dump scenarios with `-Xlog:heapdump`; > - tier1,tier2,hs-tier5-svc; > - all reg.tests that use heap dump. I am curious: what is the memory overhead for parallel mode, and (I am not familiar with the logic) how many threads are involved? Is the number of thread bounded? I ask because, especially for the OnOOM handling, we may already be at a limit memory-wise. Starting to swap will probably be worse than running single-threaded. - PR Comment: https://git.openjdk.org/jdk/pull/18748#issuecomment-2051046673
Re: RFR: 8318026: jcmd should provide access to detailed JVM object information [v9]
On Thu, 28 Mar 2024 17:15:26 GMT, Kevin Walls wrote: >>> In my opinion, UnlockDiagnosticVMOptions is not a good enough safeguard >>> since it guards a whole swathe of switches that we may instruct the >>> customer to enable. Once enabled, my experience is that >>> UnlockDiagnosticVMOptions often lingers around. It is not unusual for >>> customer scenarios to have set +UnlockDiagnosticVMOptions because of some >>> years ago support cases. >> >> I think we also need to consider the flip side of this argument. Is this >> something that some customers might want to always enable, but don't want to >> always have UnlockDiagnosticVMOptions enabled. A new command line flag would >> be needed in that case. >> >> Also, isn't UnlockDiagnosticVMOptions meant for enabling the use of >> diagnostic command line flags? Do we have examples of it enabling a hotspot >> feature that does not also require setting a diagnostic command line flag? > >> I think we also need to consider the flip side of this argument. Is this >> something that some customers might want to always enable, but don't want to >> always have UnlockDiagnosticVMOptions enabled. A new command line flag would >> be needed in that case. >> >> Also, isn't UnlockDiagnosticVMOptions meant for enabling the use of >> diagnostic command line flags? Do we have examples of it enabling a hotspot >> feature that does not also require setting a diagnostic command line flag? > > Thanks Chris - that is correct now I check the wording, > UnlockDiagnosticVMOptions: "Enable normal processing of flags relating to > field diagnostics" > > Yes it makes flags which are DIAGNOSTIC available at the command-line. We > have UnlockExperimentalVMOptions also. > > My original reason for the guard, is that the risk of fooling the "good oop" > check is that we could possibly crash (so that's answering your other > question). Inspecting an arbitrary pointer that looks enough like a Java > heap object to fool the test, means it might try and resolve bad addresses > for Klass pointer or field data. > > I also have a stress test which examines every address in a Java heap to test > for crashes. That's obviously long-running as it does a jcmd for every > iteration. But I can't currently get a crash, trying G1, ZGC, > ZGCGenerational. > > Having a global guard to prevent usage of VM.inspect may be needed less than > I thought, but will wait on the other thread before saying that is really the > case. We could have a new -XX flag to guard it (whether specific to this > command or a more general UnlockDiagnosticFeatures. I would definitely stick > to this being in all builds, as we frequently "debug" non-debug builds. > > Also, for your comment above about the new version of dbg_is_good_oop: this > was added because although there are not many callers of it, I did not want > to upset them. During my earlier testing the detailed version of this method > did upset something, although I cannot reproduce that today. > > I will rerun some tests on these items... > @kevinjwalls Thanks for clarifying the relevant behaviours and code paths as > well as identifying a few places where documentation might help devs avoid > tripping over any issues. Much appreciated. @kevinjwalls I am fine with this as well. Thank you for answering our questions. - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-2042743210
Re: RFR: 8327505: Test com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.java fails
On Tue, 19 Mar 2024 16:23:27 GMT, Kevin Walls wrote: > Client.java has a fixed 30-second timeout on the CountDownLatch to wait for > 10 notifications. > > If it fails, you can't tell if CountDownLatch.await threw, or returned false > and the app threw InterruptedException, due to the way Client.java handles > these. > > Seems most likely the 30 second wait expired, as we are dealing with -Xcomp > failures in a debug build. Passing runs can take a few seconds, but can be > 25 seconds. > > Increasing the timeout and tidying up the handling so we can see the specific > reason in future. Looks good! - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18381#pullrequestreview-1965321788
Re: RFR: JDK-8327769: jcmd GC.heap_dump without options should write to location given by -XX:HeapDumpPath, if set [v10]
On Wed, 27 Mar 2024 13:44:42 GMT, Matthias Baesken wrote: >> Currently jcmd command GC.heap_dump only works with an additionally provided >> file name. >> Syntax : GC.heap_dump [options] >> >> In case the JVM has the XX - flag HeapDumpPath set, we should support an >> additional mode where the is optional. >> In case the filename is NOT set, we take the HeapDumpPath (file or >> directory); >> >> new syntax : >> GC.heap_dump [options] .. has precedence over second option >> GC.heap_dump [options] …in case -XX: HeapDumpPath=p is set >> >> This would be a simplification e.g. for support cases where a filename or >> directory is set at JVM startup with -XX: HeapDumpPath=p and writing to the >> path is intended/recommended for usage also in the jcmd case. > > Matthias Baesken has updated the pull request incrementally with one > additional commit since the last revision: > > adjust java.1 man page To weight in on the side of this patch, in most customer scenarios I have seen, there is just one location for heap dumps. Using the same location for OOM and for manual heap dumps seems logical to me. @dholmes-ora > Notwithstanding that there may be cases where this change would be > convenient, I really don't like this coupling between the jcmd behaviour and > a -XX flag that is intended for something else. It doesn't completely mesh > with the jcmd usage and other options, and the documentation story is quite > complicated. Wouldn't this just be a case of changing a flag description? As luck has it, the flag already has a generic name that is not tied to OOMs. - PR Comment: https://git.openjdk.org/jdk/pull/18190#issuecomment-2024535044
Re: RFR: 8318026: jcmd should provide access to detailed JVM object information [v9]
On Tue, 26 Mar 2024 21:55:38 GMT, Kevin Walls wrote: >> Introduce the jcmd "VM.inspect" to implement access to detailed JVM object >> information. >> >> Not recommended for live production use. Requires UnlockDiagnosticVMOptions >> and not included in jcmd help output, to remind us this is not a >> general-purpose customer-facing tool. > > Kevin Walls has updated the pull request incrementally with one additional > commit since the last revision: > > Undo include I am still feeling uneasy about this new command. I can see its potential usefulness, but as stated before would prefer it being limited to debug-only JVMs. - jcmd can be exposed remotely, e.g., via jmx. I am not sure whether other solutions exist, but exposing jcmd beyond the immediate box the JVM runs on is something many folks want, and that is technically easy to do. So, customer-local solutions may exist for that, and the reach of jcmd may be larger than we think. - the underlying functionality for print_location was written with debugging and error analysis in mind. We keep adding functionality there. I don't think developers adding to that function have in mind that this functionality may be exposed, possibly remotely. In my opinion, UnlockDiagnosticVMOptions is not a good enough safeguard since it guards a whole swathe of switches that we may instruct the customer to enable. Once enabled, my experience is that UnlockDiagnosticVMOptions often lingers around. It is not unusual for customer scenarios to have set +UnlockDiagnosticVMOptions because of some years ago support cases. If others feel this command is safe enough, I'll shut up and be quiet, since I cannot think up a concrete attack scenario. - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-2022310874
Re: RFR: 8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync [v3]
On Mon, 18 Mar 2024 13:14:41 GMT, Richard Reingruber wrote: >> This pr changes `JfrJvmtiAgent::retransform_classes()` and >> `jfr_set_enabled` to switch to `WXWrite` before transitioning to the vm. >> >> Testing: >> make test TEST=jdk/jfr/event/runtime/TestClassLoadEvent.java >> TEST_VM_OPTS=-XX:+AssertWXAtThreadSync >> make test TEST=compiler/intrinsics/klass/CastNullCheckDroppingsTest.java >> TEST_VM_OPTS=-XX:+AssertWXAtThreadSync >> >> More tests are pending. > > Richard Reingruber has updated the pull request with a new target base due to > a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains three additional > commits since the last revision: > > - Merge branch 'master' into 8327990_jfr_enters_vm_without_wxwrite > - Set WXWrite at more locations > - Switch to WXWrite before entering the vm I think this patch makes sense, and does not compete with a long-term solution. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18238#pullrequestreview-1947069985
Re: RFR: 8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync [v3]
On Tue, 19 Mar 2024 12:17:22 GMT, David Holmes wrote: >> Instead, could we tag code that needs one or the other, keep track of the >> current WX state in thread-local memory, and flip WX only when we know we >> need to? The first part we already do. I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation. (Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow). > > That's very odd. The example there doesn't even involve MAP_JIT memory, so > > what does it have to do with WX? > > @theRealAph that is the mystery we hope will be resolved once we know the > nature of the underlying OS bug. Somehow switching to exec mode > fixes/works-around the issue. I can imagine a missing conditional to check if > the region is MAP_JIT. > > > Changing WX at VM state transitions is a form of temporal coupling, a > > classic design smell that has caused problems for decades. > > The original introducers of WXEnable made the decision that the VM should be > in WRITE mode unless it needs EXEC. That is the state we are presently trying > to achieve with this change. If that original design choice is wrong then ... > > > Instead, could we tag code that needs one or the other, keep track of the > > current WX state in thread-local memory, and flip WX only when we know we > > need to? > > And I've asked about this every time a missing WXEnable has had to be added. > We seem to be generically able to describe what kind of code needs which > mode, but we seem to struggle to pin it down. Though that is what > https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing. > > > That'd (by definition) reduce the number of transitions to the minimum if > > we were through. > > Not necessarily. It may well remove some transitions from paths that don't > need it, but if you move the state change too low down the call chain you > could end up transitioning much more often in code that does need it e.g. if > a transitioning method is called in a loop. Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway. - PR Comment: https://git.openjdk.org/jdk/pull/18238#issuecomment-2007469410
Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v7]
On Tue, 5 Mar 2024 11:31:13 GMT, Kevin Walls wrote: >> Introduce the jcmd "VM.debug" to implement access to a useful set of the >> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...". >> >> Not recommended for live production use. Calling these "debug" utilities, >> and not including them in the jcmd help output, is to remind us they are not >> general customer-facing tools. > > Kevin Walls has updated the pull request incrementally with one additional > commit since the last revision: > > Test update Hi Kevin, > > (If you find existing jcmds a mess, feel free to suggest changes as > appropriate.) you touch on a problem here, and why I think adding commands should be done more carefully (and I am guilty of adding commands too). Once these commands are rolled out, they find themselves in blogs, knowledge bases, scripts that are reused for different JDK releases, and so on. It is difficult to change them post-release. That is why good names are important. Cheers, Thomas - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1980807020
Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v4]
On Mon, 4 Mar 2024 22:02:53 GMT, Kevin Walls wrote: >> Kevin Walls has updated the pull request incrementally with three additional >> commits since the last revision: >> >> - Usage correction >> - Help to clarify this is VM inspection. Comment to relate source to >> debug.cpp. >> - jcheck trailing whitespace > > Hi Thomas @tstuefe - > > Security concerns certainly needed some thought. We have remote access to > DiagnosticCommands over JMX. I don't see a particular need for that for this > command. I think it was keeping the DCmd flagged as hidden that hides it > from the list of dcmds available in that way. So we have the attach api > controls using local user identity, as we do for everything else. > > Yes, "events" is not that useful, I should remove it. I was taking the > useful parts of debug.cpp, and although the events were in VM.info, I had > missed that since your JDK-8224600, VM.events is its own command now! > > Thread.print with stacks, vs "VM.debug threads" which is just a thread list, > so you have e.g. JavaThread*, name, native id, stack range. > > "find": Usage at the moment, would likely be MessageBoxOnError, and a native > debugger to get the native stacktrace and parameters that include an object > reference you care about. It might be other jcmds if e.g. events capture a > relevant problematic pointer, but likely it involves a native debugger. > Using jcmd behaves better, showing the output where you run jcmd, not in the > VM's current output as the debug.cpp utils do. > Native debugger syntax to call abritrary functions can be awkward, > particularly on Windows, so the jcmd should be a better experience. > > This might be more compelling in post-mortem usage. Am working on that. > i.e. jcmd on a core file. But I am saying it offers some value today. > > The class/method finders, I've heard some enthusiasm for their inclusion. We > don't want to encourage overlap but yes we do have some overlapping jcmds. > > > This is an umbrella but I don't think it's vague. VMDebugDcmd is for > inspecting VM state. It's inspired by debug.cpp utilities, does not need to > implement all of them, but does aim to make them more accessible (I will > assert that they are not well known, which is hard to prove.) > > Do we have a problem with jcmd feature creep? If anything has crept too far > it can be addressed. Looks like the DCmd classes have approximately doubled > since jdk8u but this looks like growth in a good way. > > Thanks > Kevin Hi Kevin, @kevinjwalls , thank you for your explanations. Please find answers inline. > Security concerns certainly needed some thought. We have remote access to > DiagnosticCommands over JMX. I don't see a particular need for that for this > command. I think it was keeping the DCmd flagged as hidden that hides it from > the list of dcmds available in that way. So we have the attach api controls > using local user identity, as we do for everything else. > > Yes, "events" is not that useful, I should remove it. I was taking the useful > parts of debug.cpp, and although the events were in VM.info, I had missed > that since your JDK-8224600, VM.events is its own command now! > > Thread.print with stacks, vs "VM.debug threads" which is just a thread list, > so you have e.g. JavaThread*, name, native id, stack range. > > "find": Usage at the moment, would likely be MessageBoxOnError, and a native > debugger to get the native stacktrace and parameters that include an object > reference you care about. It might be other jcmds if e.g. events capture a > relevant problematic pointer, but likely it involves a native debugger. Using > jcmd behaves better, showing the output where you run jcmd, not in the VM's > current output as the debug.cpp utils do. Native debugger syntax to call > abritrary functions can be awkward, particularly on Windows, so the jcmd > should be a better experience. I remain sceptic here, because in my experience, once you start poking at the JVM innards at this level, I guess you will be quickly at the limit of what this command can do for you and need to attach a debugger anyway. Could this be an own command, e,g, `VM.inspect`, and possibly limited to debug VMs? Do we really need this feature in production? > > This might be more compelling in post-mortem usage. Am working on that. i.e. > jcmd on a core file. But I am saying it offers some value today. Don't we have jhsdb for that? > > The class/method finders, I've heard some enthusiasm for their inclusion. We > don't want to encourage overlap but yes we do have some overlapping jcmds. Yeah, I can see this being useful. > > This is an umbrella but I don't think it's vague. VMDebugDcmd is for > inspecting VM state. It's inspired by debug.cpp utilities, does not need to > implement all of them, but does aim to make them more accessible (I will > assert that they are not well known, which is hard to prove.) > > Do we have
Re: RFR: 8318026: jcmd should provide access to low-level JVM debug information [v4]
On Mon, 4 Mar 2024 15:10:12 GMT, Kevin Walls wrote: >> Introduce the jcmd "VM.debug" to implement access to a useful set of the >> established debug.cpp utilities, with "jcmd PID VM.debug subcommand ...". >> >> Not recommended for live production use. Calling these "debug" utilities, >> and not including them in the jcmd help output, is to remind us they are not >> general customer-facing tools. > > Kevin Walls has updated the pull request incrementally with three additional > commits since the last revision: > > - Usage correction > - Help to clarify this is VM inspection. Comment to relate source to > debug.cpp. > - jcheck trailing whitespace Of the sub commands given in the CSR: eventsHotSpot JVM Event log threads Thread list find ADDRESS Describe an address given by the argument findclass CLASS_PATTERN FLAGS findmethod CLASS_PATTERN METHOD_PATTERN FLAGS 1) `events` is redundant (we have VM.events, and events are printed as part of VM.info) 2) `threads` is handled by Thread.info 3) I am still puzzled about `find`. This is to ask a running VM about an arbitrary address. But how do you get such an address? Usually only via debugging. Then I am already attached with a system debugger. Under which circumstances would this be needed? Can you give a usage example? 4) and 5) shared some overlap with VM.classes, VM.class_hierarchy, VM.metaspace and various Compiler.xx commands. I am mostly worried about adding such a low-level debug command to jcmd. Especially in such a vague umbrella form. Experiences show that these commands get extended easily, often without CSR. So its an open door for feature creep. I mostly worry about unforeseen security consequences, especially if jcmd reach is extended across machine boundaries. Weighted against that, the benefits seem a bit slim to me. Again, real-world use cases would help. - PR Comment: https://git.openjdk.org/jdk/pull/17655#issuecomment-1976941359
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v26]
On Mon, 26 Feb 2024 11:24:13 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > >Address review comments Okay - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1902581556
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v25]
On Tue, 20 Feb 2024 09:27:15 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > >remove space Changes requested by stuefe (Reviewer). src/hotspot/os/aix/os_aix.cpp line 1167: > 1165: Load "libfilename.so" first, then load libfilename.a, on failure. > 1166: In OpenJ9, the libary with .so extension is loaded first and then .a > extension, on failure. > 1167: */ Wrong block comment, but the comment itself is also off now. Suggestion: Suggestion: // Load library named // If filename matches .so, and loading fails, repeat with .a. src/hotspot/os/aix/os_aix.cpp line 1173: > 1171: char* const pointer_to_dot = strrchr(file_path, '.'); > 1172: char const *old_extension = ".so"; > 1173: char const *new_extension = ".a"; Suggestion: char* const file_path = strdup(filename); char* const pointer_to_dot = strrchr(file_path, '.'); const char old_extension[] = ".so"; const char new_extension[] = ".a"; STATIC_ASSERT(sizeof(old_exception) >= sizeof(new_exception)); and remove runtime-assert below src/hotspot/os/aix/os_aix.cpp line 1178: > 1176: FREE_C_HEAP_ARRAY(char, file_path); > 1177: return result; > 1178: } I would remove this whole section since it's a functional change not covered by the issue description and unnecessary for your fix. It may also be wrong: loading shared objects without extension may be perfectly valid. The extension is just a convention. src/hotspot/os/aix/os_aix.cpp line 1183: > 1181: // If the load fails,we try to reload by changing the extension to .a > for .so files only. > 1182: // Shared object in .so format dont have braces, hence they get > removed for archives with members. > 1183: if (result == nullptr) { Suggestion: if (result == nullptr && pointer_to_dot != nullptr && strcmp(pointer_to_dot, old_extension) == 0) { src/hotspot/os/aix/os_aix.cpp line 1184: > 1182: // Shared object in .so format dont have braces, hence they get > removed for archives with members. > 1183: if (result == nullptr) { > 1184: assert(strlen(new_extension) < strlen(old_extension), "New > extension length must be less than existing one"); Can be removed if you do the STATIC_ASSERT suggested above. src/hotspot/os/aix/os_aix.cpp line 1186: > 1184: assert(strlen(new_extension) < strlen(old_extension), "New > extension length must be less than existing one"); > 1185: if (strcmp(pointer_to_dot, old_extension) == 0) { > 1186: sprintf(pointer_to_dot, "%s", new_extension); Use os::snprintf, and limit output buffer by size of old extension. - PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1899612274 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757595 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501754431 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501754774 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757736 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757772 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1501757844
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v23]
On Mon, 19 Feb 2024 13:25:28 GMT, Joachim Kern wrote: >> Thanks for the detailed explanation @JoKern65 . Do then in this errno check >> may not be necessary ? or can we still set errno and access it some way ? > > In this special case here I would not use errno, but the string returned in > ebuf, in case the result is nullptr. Argh, I keep forgetting `dlopen` does not set errno (at least its not guaranteed by the standard). Ok, in this case, I would not analyze the string either because it may be locale-dependent. Never mind then; I'd thought this would be easy. One could probably use load() instead of dlopen() on AIX, but that change would require more investigations to see if load() and dlopen() are equivalent. Okay, in that case just skip the errno test and call the second dlopen directly. Lets hope for the best then. - PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1494586044
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]
On Fri, 16 Feb 2024 12:25:39 GMT, Suchismith Roy wrote: > > > > Hi, > > > > some remarks: > > > > > > > > * there is no need for a copy for the first call to dll_load_library. > > > > Just hand in the string 1:1. > > > > * I would only do the *.a fallback loading if the error indicates that > > > > the *.so file had not been there. So, only if EACCESS or ENOENT; in all > > > > other cases I would not do the fallback. E.g. if the *.so file cannot > > > > be loaded due to a header mismatch. See > > > > https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines > > > > * Please use os::strdup. > > > > * Please assert that the replacement string is smaller than the > > > > original string (which it should be, *.so is longer than *.a, but this > > > > is insurance against anyone changing the code in the future) > > > > > > > > Thank you, Thomas > > > > > > > > > Sure working on them. May i know why we are using the load routine in the > > > 2nd point ? . Currently we do a *.a fallback only when dlopen fails. Does > > > load function save some steps here ? > > > > > > I don't understand the question, sorry. > > What I mean is when the first dlopen fails AND its error indicates the > > shared library had been missing, only then attempt the *.a fallback. > > I see. I think i was referred to the init routine in the link. Ah, okay. No, IBMs dlopen page for AIX refers to this page for the list of possible error codes; I assume dlopen just uses these functions under the hood. > So you mean based on the errors inside dll_load_library, we set the errno to > appropriate Error and then check for that before using the fallback , is that > correct ? No. I mean distinguish whether dlopen fails because it could not find the file (a valid reason to retry with *.a) or failed for another reason (which would not be a valid reason to retry with *.a). - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-194840
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]
On Thu, 15 Feb 2024 17:50:23 GMT, Suchismith Roy wrote: > > Hi, > > some remarks: > > > > * there is no need for a copy for the first call to dll_load_library. Just > > hand in the string 1:1. > > * I would only do the *.a fallback loading if the error indicates that the > > *.so file had not been there. So, only if EACCESS or ENOENT; in all other > > cases I would not do the fallback. E.g. if the *.so file cannot be loaded > > due to a header mismatch. See > > https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines > > * Please use os::strdup. > > * Please assert that the replacement string is smaller than the original > > string (which it should be, *.so is longer than *.a, but this is insurance > > against anyone changing the code in the future) > > > > Thank you, Thomas > > Sure working on them. May i know why we are using the load routine in the 2nd > point ? . Currently we do a *.a fallback only when dlopen fails. Does load > function save some steps here ? I don't understand the question, sorry. What I mean is when the first dlopen fails AND its error indicates the shared library had been missing, only then attempt the *.a fallback. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1947780121
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v22]
On Mon, 12 Feb 2024 18:04:21 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > > Remove not matched trailing whitespaces Hi, some remarks: - there is no need for a copy for the first call to dll_load_library. Just hand in the string 1:1. - I would only do the *.a fallback loading if the error indicates that the *.so file had not been there. So, only if EACCESS or ENOENT; in all other cases I would not do the fallback. E.g. if the *.so file cannot be loaded due to a header mismatch. See https://www.ibm.com/docs/en/aix/7.1?topic=l-load-loadandinit-subroutines - Please use os::strdup. - Please assert that the replacement string is smaller than the original string (which it should be, *.so is longer than *.a, but this is insurance against anyone changing the code in the future) Thank you, Thomas - Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1882051726
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v13]
On Wed, 31 Jan 2024 13:17:21 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > > spelling I'm busy with FOSDEM this week and probably next. Will look at this end of next week or the week after. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1921004805
Re: RFR: 8307977: jcmd and jstack broken for target processes running with elevated capabilities
On Tue, 30 Jan 2024 10:47:22 GMT, Sebastian Lövdahl wrote: > 8307977: jcmd and jstack broken for target processes running with elevated > capabilities ping @jerboaa - PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1916676356
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v9]
On Thu, 25 Jan 2024 11:04:03 GMT, Suchismith Roy wrote: > > For me the unresolved question is still: > > > > * do we want an unconditional load of *.a for a given *.so (have yet to see > > any documentation for this a-file duality) > > Yes. The documentation link - > https://www.ibm.com/docs/en/aix/7.3?topic=memory-shared-objects-run-time-linking > The text **In dynamic mode, input files specified with the -l flag may end > in .so, as well as in .a. That is, a reference to -lfoo is satisfied by the > first libfoo.so or libfoo.a found in any of the directories being searched. > Dynamic mode is in effect by default unless the -bstatic option is used.** > > https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command > > Archive files are composite objects, which usually contain import files and > object files, including shared objects. If an archive file contains another > archive file or a member whose type is not recognized, the ld command issues > a warning and ignores the unrecognized member. If an object file contained in > an archive file has the F_LOADONLY bit set in the XCOFF header, the ld > command ignores the member. This bit is usually used to designate old > versions of shared objects that remain in the archive file to allow existing > applications to load and run. New applications link with the new version of > the shared object, that is, another member of the archive. Excellent, thank you. > > > * if we do, do we want that to be bidirectional? Someone specifies *.a, do > > we want to attempt to load *.so? > > Considering the different scenarios, loading .a after .so failure should > suffice. I got a chance to look at the right file in OpenJ9-omr ,which has a > native code which does an attempt to load archive files after trying to load > .so files. This code was always there and it explains why the issue did not > occur in Semeru, which is derived from this repository. Okay. We don't have to be better than J9 then. If they do it, we should too. So, for the following input, we do: "library.so" -> load "library.so", then "library.a" "library"-> load "library.so", then "library.a" ? "library.a" -> only load "library.a" ? (*) > > > When in doubt, we should just mimic what OpenJ9 is doing on AIX. But I > > would like a clear documentation as a comment in os_aix.cpp explaining the > > logic and referencing the relevant OpenJ9 files. > > Any example comment you can refer ? I mean i just mention the file name in > OpenJ9 and explain the logic ? Let me know for any further clarifications Just reference the excerpts you mentioned above, then describe your intended logic. Example: "When loading .so, upon failure we attempt to load . When loading a library given without extension, ..." Explaining the logic makes it easy to see for the casual code reader what your intent is, that you have thought of all cases (*), and makes it possible to check the coding against your intent. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1911652995
Re: RFR: JDK-8324637: [aix] Implement support for reporting swap space in jdk.management
On Thu, 25 Jan 2024 12:30:15 GMT, Matthias Baesken wrote: > The get_total_or_available_swap_space_size coding misses AIX support, we only > return 0. This should be enhanced. > The perfstat API can be used, see > https://www.ibm.com/docs/pt/aix/7.2?topic=interfaces-perfstat-memory-total-interface > . Small nit, otherwise good. src/jdk.management/unix/native/libmanagement_ext/OperatingSystemImpl.c line 113: > 111: throw_internal_error(env, "perfstat_memory_total failed"); > 112: } > 113: return available ? (jlong)(memory_info.pgsp_free * 4L * 1024L) : > (jlong)(memory_info.pgsp_total * 4L * 1024L); Do we need the cast? perfstat_memory_total_t members are all 64-bit, no? Also, can we shorten this to: return (available ? memory_info.pgsp_free : memory_info.pgsp_total) * 4096; - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17569#pullrequestreview-1843868729 PR Review Comment: https://git.openjdk.org/jdk/pull/17569#discussion_r1466454083
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v9]
On Tue, 16 Jan 2024 08:36:49 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with three > additional commits since the last revision: > > - Update porting_aix.cpp > - Update porting_aix.cpp > - Update os_aix.cpp For me the unresolved question is still: - do we want an unconditional load of *.a for a given *.so (have yet to see any documentation for this a-file duality) - if we do, do we want that to be bidirectional? Someone specifies *.a, do we want to attempt to load *.so? When in doubt, we should just mimic what OpenJ9 is doing on AIX. But I would like a clear documentation as a comment in os_aix.cpp explaining the logic and referencing the relevant OpenJ9 files. There are some minor issues with the code itself, but I will defer reviewing. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1907550400
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v10]
On Fri, 22 Dec 2023 14:50:16 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > additional fix of sideeffect reported in JDK-8322691 src/hotspot/os/aix/porting_aix.cpp line 1071: > 1069: if (max_handletable == 0) { > 1070: // First time we allocate memory for 128 Entries > 1071: char* ptmp = (char*)::malloc(128 * sizeof(struct > handletableentry)); No need for malloc. You can start with realloc, since realloc(NULL, ...) is malloc. static handletablentry* tab = nullptr; static unsigned max_handles = 0; ... if (need more handles) unsigned new_max = MAX2(max_handles * 2, init_num_handles); handleentry* new_tab = ::realloc(p_handletable, sizeof(handleentry) * new_max); if (new_tab != nullptr) { max_handles = new_max; tab= new_tab; } - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1435139923
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Thu, 21 Dec 2023 11:54:17 GMT, Martin Doerr wrote: >> Dynamic allocation also opens us up to potential initialization issues, >> unless we explicitly use raw ::malloc. It should work, but I think its >> better avoided unless we really need it. > > Well we're fixing an academic issue by introducing another one? Doesn't make > sense to me. Okay, I butt out, I don't care enough. Up to you both to decide what to do. My recommendation would still be to avoid hotspot infrastructure that relies on os::malloc and friends; other than that, rewriting this table to make it growable using realloc should be trivial. Note that we need *some* sort of limit though. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433996779
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Thu, 21 Dec 2023 11:23:46 GMT, Joachim Kern wrote: >> I don't like introducing unnecessary limitations. Are we sure nobody will >> ever need more than 1024 handles? >> Can't we at least use a GrowableArray or something like that? > > In principle you are right, but in my opinion 1024 is an academical limit. I > never saw processes with more than a few dozen loaded libraries. Dynamic allocation also opens us up to potential initialization issues, unless we explicitly use raw ::malloc. It should work, but I think its better avoided unless we really need it. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433963889
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v7]
On Thu, 21 Dec 2023 09:37:55 GMT, Suchismith Roy wrote: > > > > What happens if we accidentally attempt to load a "real" static > > > > library, which is also named *.a? Would dlopen() then crash? What would > > > > happen? > > > > > > > I don't think the problem is with *.a . They would load as the default > > > behaviour of the dlopen. It is only when the dlopen fails for *.so , we > > > give another chance to check for .a file with the same name. > > > > > > No, what I meant, and what must be clarified before going forward with this > > solution, is the following: > > > > * is _every_ `*.a` object on AIX loadable with `dlopen`, and will the > > result be the same as when loading a `*.so` object > > * or, if we present arbitrary `*.a` files to dlopen, is there a chance for > > dlopen to crash or misbehave. > > > > Reason is that I was under the impression that *.a libraries are static > > libraries and cannot be loaded dynamically. This is what you now try to do. > > If we cannot safely answer this question, I would opt for a more narrow > > solution by hard-wiring known alternative names. So, do the second *.a > > attempt only for your `ibm_16_am.a` which you know works. That could also > > be done in a reasonably maintainable manner. > > In AIX, both static and dynamic libraries have *.a extension. And AIX also > supports *.so files.Bascially shared objects in AIX have both *.a and *.so > extension. Hence we need to implement this logic. If we try loading a static > archive specifically ,how the dlopen would behave , that is something > probably @JoKern65 can answer ? Rather, this is a question you have to ask your collegues at IBM that develop the AIX libc. Since AIX libc is not open source, we cannot look for ourselves, nor can Joachim (her works at SAP). > > > > > Does this really have to be handled in the OpenJDK? What does J9 on AIX > > > > do? Could this be done in a simpler way outside OpenJDK, e.g. by > > > > providing an *.so variant of the library in question? Where does this > > > > library come from? > > > > > > > I am not sure how J9 handles this. I would have to consult . > > > > > > J9 is Open Source, can't you just look? :) > > I did try comparing the file structures, and i do not see a similar file > structure over there. I am unable to find the jvmTiAgent code and also os_aix > file. So i am not sure which functions over there are doing the same > functionality. You have any suggestion on how i can check and correlate ? Someone must implement LoadLibrary. Try looking for places where dlopen() is called. > > > > However as per current observation, this issue does not show up on > > > Semuru. This issue is only happening on Adoptium. The team that release > > > these file has always released *.a files which work fine for Semuru. > > > > > > I don't know what Semuru is. What is the context, is that a different VM? > > Also OpenJDK? J9 derived? > > Semuru is J9 derived. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1865977132
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Thu, 21 Dec 2023 09:37:57 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/porting_aix.cpp line 916: >> >>> 914: constexpr int max_handletable = 1024; >>> 915: static int g_handletable_used = 0; >>> 916: static struct handletableentry g_handletable[max_handletable] = {{0, >>> 0, 0, 0}}; >> >> Wouldn't `ConcurrentHashTable` be a better data structure? It is already >> used in hotspot, can grow dynamically and doesn't need linear search. > > There will be only few libraries in the list. With this assumption Thomas > suggested to use just a simple array. Let's keep it simple. A linear array of only a few items is easily scanned, probably faster than pointer hopping hash table entries. Not that it matters in any way for the few calls to dlopen. Also, avoiding hotspot structures preserves layer integrity (porting_aix does not pull anything from hotspot so far) and prevents initialisation time dependencies. Not sure whether ConcurrentHashTable works before VM init, but with Joachimes current solution, we can call dlopen at any time in VM life. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433839119
Re: RFR: JDK-8320005 : Allow loading of shared objects with .a extension on AIX [v7]
On Thu, 21 Dec 2023 08:16:22 GMT, Suchismith Roy wrote: >> What happens if we accidentally attempt to load a "real" static library, >> which is also named *.a? Would dlopen() then crash? What would happen? > I don't think the problem is with *.a . They would load as the default > behaviour of the dlopen. It is only when the dlopen fails for *.so , we give > another chance to check for .a file with the same name. No, what I meant, and what must be clarified before going forward with this solution, is the following: - is *every* `*.a` object on AIX loadable with `dlopen`, and will the result be the same as when loading a `*.so` object - or, if we present arbitrary `*.a` files to dlopen, is there a chance for dlopen to crash or misbehave. Reason is that I was under the impression that *.a libraries are static libraries and cannot be loaded dynamically. This is what you now try to do. If we cannot safely answer this question, I would opt for a more narrow solution by hard-wiring known alternative names. So, do the second *.a attempt only for your `ibm_16_am.a` which you know works. That could also be done in a reasonably maintainable manner. >> Does this really have to be handled in the OpenJDK? What does J9 on AIX do? >> Could this be done in a simpler way outside OpenJDK, e.g. by providing an >> *.so variant of the library in question? Where does this library come from? > I am not sure how J9 handles this. I would have to consult . J9 is Open Source, can't you just look? :) > However as per current observation, this issue does not show up on Semuru. > This issue is only happening on Adoptium. The team that release these file > has always released *.a files which work fine for Semuru. I don't know what Semuru is. What is the context, is that a different VM? Also OpenJDK? J9 derived? - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1865859345
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v8]
On Wed, 20 Dec 2023 14:53:06 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > improve error handling still ok, small nit inside src/hotspot/os/aix/porting_aix.cpp line 1033: > 1031: // filled by os::dll_load(). This way we mimic dl handle equality for a > library > 1032: // opened a second time, as it is implemented on other platforms. > 1033: void* Aix_dlopen(const char* filename, int Flags, const char** > error_report) { add assert for error_report != nullptr - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1792301031 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1433606032
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v7]
On Wed, 20 Dec 2023 11:16:03 GMT, Suchismith Roy wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > Suchismith Roy has updated the pull request incrementally with one additional > commit since the last revision: > > Spaces fix Hi, some requests and questions: - Please modify the JBS title, PR title, and JBS issue text to reflect that this adds an alternative shared object loading path for shared objects on AIX. Something like "Allow loading shared objects with .a extension on AIX". Please describe the new logic in the JBS issue text. - Does this really have to be handled in the OpenJDK? What does J9 on AIX do? Could this be done in a simpler way outside OpenJDK, e.g. by providing an *.so variant of the library in question? Where does this library come from? - What happens if we accidentally attempt to load a "real" static library, which is also named *.a? Would dlopen() then crash? What would happen? - What happens if the original path handed to os::dll_load is already a *.a file? Should the logic then be reversed? - We really need regression tests for this. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1864572287
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Tue, 19 Dec 2023 12:37:33 GMT, Suchismith Roy wrote: >> The libpath parsing code is from me, so no license problems. > > Hi @JoKern65 Is this good to integrate now ? @suchismith1993 > Once this code goes in I can push in my changes. We are targeting the fix for > January. If you talk about https://github.com/openjdk/jdk/pull/16604, no, you cannot push that even if Joachim finishes his work. Your patch has not even a single review, is quite controversial, and none of the issues the reviewers have mentioned are addressed. This needs a lot more discussion time. - PR Comment: https://git.openjdk.org/jdk/pull/16920#issuecomment-1862704694
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v6]
On Mon, 18 Dec 2023 13:33:46 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Followed Thomas proposals > > Well done. > > Releasing the mutex before asserting is not necessary; we don't pull the > handle table lock as part of error reporting. > @tstuefe Sorry to tag you. Can you review the code. Once this code goes in I > can push in my changes. We are targeting the fix for January. > Hi @JoKern65 Is this good to integrate now ? @suchismith1993 Please don't put pressure on patch authors and developers. There is zero reason why this patch should be rushed. > Hi @suchismith1993, I'm waiting for a second review. Complex hotspot changes > should be reviewed twice. Not only that, hotspot changes *need* to be reviewed by at least two reviewers. That is not optional. See OpenJDK bylaws. - PR Comment: https://git.openjdk.org/jdk/pull/16920#issuecomment-1862695052
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Mon, 18 Dec 2023 11:12:23 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/porting_aix.cpp line 1097: >> >>> 1095: } >>> 1096: >>> 1097: pthread_mutex_lock(_handletable_mutex); >> >> You can make your life a lot easier by defining an RAII object at the start >> of the file: >> >> struct TableLocker { >> TableLocker() { pthread_mutex_lock(_handletable_mutex); } >> ~TableLocker() { pthread_mutex_unlock(_handletable_mutex); } >> }; >> >> and just place this at the beginning of your two functions >> >> TableLocker lock: >> ... >> >> >> no need to manually unlock then, with the danger of missing a return. > > Great, thank you. This was one of the things I thought about, but was not > sure, because I did not fully understood the MutexLocker class and the > difference between Monitor and Mutex. In hindsight, pthread mutex is the better choice anyway: it avoids dependencies to the VM lifecycle (VM mutexes are only available after VM initialization, so we could not call dlopen() before that). - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1430380082
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Mon, 18 Dec 2023 10:35:48 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/porting_aix.cpp line 990: >> >>> 988: if (env == nullptr) { >>> 989: // no LIBPATH, try with LD_LIBRARY_PATH >>> 990: env = getenv("LD_LIBRARY_PATH"); >> >> Is LD_LIBRARY_PATH a thing on AIX? I thought it is only used on non-AIX. > > Yes it is, It's the fallback if LIBPATH is not defined In that case there may be errors in other places, since so far we assumed its either one or the other, but not both. Example: https://github.com/openjdk/jdk/blob/a247d0c74bea50f11d24fb5f3576947c6901e567/src/java.base/unix/native/libjli/java_md.c#L43C1-L47 Maybe you need to take a look here, in case LD_LIBRARYPATH needs to be handled in addition to LIBPATH? - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429917901
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v6]
On Mon, 18 Dec 2023 11:30:59 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > Followed Thomas proposals Well done. Releasing the mutex before asserting is not necessary; we don't pull the handle table lock as part of error reporting. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1786905733
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Mon, 18 Dec 2023 10:06:34 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - trailing whitespace >> - Following most of Thomas proposals > > src/hotspot/os/aix/porting_aix.cpp line 934: > >> 932: struct scnhdr the_scn; >> 933: struct ldhdr the_ldr; >> 934: size_t sz = FILHSZ + _AOUTHSZ_EXEC; > > please rename to xcoffsz, and make constexpr: `constexpr size_t xcoffsz = ...` Also, can you please add STATIC_ASSERT(sizeof(the_xcoff) == xcoffsz); STATIC_ASSERT(sizeof(the_scn) == SCNHSZ); STATIC_ASSERT(sizeof(the_ldr) == LDHDRSZ); - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429853292
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v5]
On Fri, 15 Dec 2023 11:57:51 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with two additional > commits since the last revision: > > - trailing whitespace > - Following most of Thomas proposals I like this, this is good. Small nits remain. src/hotspot/os/aix/os_aix.cpp line 30: > 28: #pragma alloca > 29: > 30: please remove whitespace change src/hotspot/os/aix/os_aix.cpp line 193: > 191: // local variables > 192: > 193: please remove whitespace change src/hotspot/os/aix/os_aix.cpp line 1113: > : } > 1112: > 1113: please remove whitespace change src/hotspot/os/aix/porting_aix.cpp line 934: > 932: struct scnhdr the_scn; > 933: struct ldhdr the_ldr; > 934: size_t sz = FILHSZ + _AOUTHSZ_EXEC; please rename to xcoffsz, and make constexpr: `constexpr size_t xcoffsz = ...` src/hotspot/os/aix/porting_aix.cpp line 990: > 988: if (env == nullptr) { > 989: // no LIBPATH, try with LD_LIBRARY_PATH > 990: env = getenv("LD_LIBRARY_PATH"); Is LD_LIBRARY_PATH a thing on AIX? I thought it is only used on non-AIX. src/hotspot/os/aix/porting_aix.cpp line 1005: > 1003: // LIBPATH or LD_LIBRARY_PATH and second with burned in libpath. > 1004: // No check against current working directory > 1005: Libpath.print("%s:%s", env, rtv_linkedin_libpath()); Are you sure libpath env var has precedence over the baked-in libpath? src/hotspot/os/aix/porting_aix.cpp line 1097: > 1095: } > 1096: > 1097: pthread_mutex_lock(_handletable_mutex); You can make your life a lot easier by defining an RAII object at the start of the file: struct TableLocker { TableLocker() { pthread_mutex_lock(_handletable_mutex); } ~TableLocker() { pthread_mutex_unlock(_handletable_mutex); } }; and just place this at the beginning of your two functions TableLocker lock: ... no need to manually unlock then, with the danger of missing a return. src/hotspot/os/aix/porting_aix.cpp line 1101: > 1099: for (i = 0; i < g_handletable_used; i++) { > 1100: if (g_handletable[i].handle == libhandle) { > 1101: // handle found, decrease refcount `assert(refcount > 0, "Sanity"))` src/hotspot/os/aix/porting_aix.cpp line 1143: > 1141: // entry of the array to the place of the entry we want to remove > and overwrite it > 1142: if (i < g_handletable_used) { > 1143: g_handletable[i] = g_handletable[g_handletable_used]; To be super careful, I would zero out at least the handle of the moved item like this: `g_handletable[g_handletable_used].handle = nullptr` - Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1786400492 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870755 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870833 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870885 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429849403 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429858465 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429859923 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429868182 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429863665 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1429870057
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Fri, 15 Dec 2023 10:18:53 GMT, Joachim Kern wrote: >> src/hotspot/os/aix/os_aix.cpp line 206: >> >>> 204: constexpr int max_handletable = 1024; >>> 205: static int g_handletable_used = 0; >>> 206: static struct handletableentry g_handletable[max_handletable] = {{0, >>> 0, 0, 0}}; >> >> I would move all that new and clearly delineated dlopen stuff into an own >> file, e.g. dlopen_aix.cpp or porting_aix.cpp (in porting_aix.cpp, we already >> have wrappers for other functions). os_aix.cpp is already massive. > > I moved the static variable declarations and the functions `Aix_dlopen(), > search_file_in_LIBPATH(), rtv_linkedin_libpath()` and `os::pd_dll_unload()` > to porting_aix.cpp. This links, but in my opinion `os::pd_dll_unload()` > should reside in os_aix.cpp, because it is member of the os class. But there > it will not compile anymore if the static variables are moved away. No, what I meant was to provide a "libc-like" equivalent for dlopen, similar to what we do with dladdr (see https://github.com/openjdk/jdk/blob/b7676822886eac21f61ff361a32928a966d8fe31/src/hotspot/os/aix/porting_aix.cpp#L306). But never mind; I am also fine with moving os::pd_dlopen into a different cpp file, e.g. "dlopen_aix.cpp". Just move it out of os_aix.cpp, since that is already massive and you add >300 lines of more code and more dependencies. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427812795
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Fri, 15 Dec 2023 09:57:19 GMT, Joachim Kern wrote: > If we omit the xcoff32 we have to ensure that no xcoff32 executable file > comes into play. xcoff32 is for 32-bit binaries. The AIX port only exists for 64-bit, and there will never be a 32-bit AIX port, so there is no reason for handling 32-bit xcoff headers. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427803763
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Fri, 15 Dec 2023 06:22:39 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> followed the proposals > > src/hotspot/os/aix/os_aix.cpp line 1129: > >> 1127: >> 1128: // get the library search path burned in to the executable file during >> linking >> 1129: // If the libpath cannot be retrieved return an empty path > > This is new. Is this complexity needed, if yes, why? Don't see a comment, may > have missed it. Also, why are we parsing xcoff32 headers in there? AIX OpenJDK will always be 64-bit. So, you can replace the whole xcoff32 section with assert( f_magic == U802TOCMAGIC, ..). The function becomes a lot simpler then. > src/hotspot/os/aix/os_aix.cpp line 1132: > >> 1130: static const char* rtv_linkedin_libpath() { >> 1131: static char buffer[4096]; >> 1132: static const char* libpath = 0; > > If your intent is to return an empty buffer if there is no contained libpath, > I would just: > > > static const char* libpath = ""; > > then you can always just return libpath. But looking at the using code, returning NULL in case there is no contained libpath would be actually easier, see below. - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427609926 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1427639138
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v4]
On Tue, 12 Dec 2023 14:05:48 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > followed the proposals Is this libpath parsing code copied from the R3 kernel? If yes, pls make sure there are no licensing issues. src/hotspot/os/aix/os_aix.cpp line 206: > 204: constexpr int max_handletable = 1024; > 205: static int g_handletable_used = 0; > 206: static struct handletableentry g_handletable[max_handletable] = {{0, 0, > 0, 0}}; I would move all that new and clearly delineated dlopen stuff into an own file, e.g. dlopen_aix.cpp or porting_aix.cpp (in porting_aix.cpp, we already have wrappers for other functions). os_aix.cpp is already massive. src/hotspot/os/aix/os_aix.cpp line 1129: > 1127: > 1128: // get the library search path burned in to the executable file during > linking > 1129: // If the libpath cannot be retrieved return an empty path This is new. Is this complexity needed, if yes, why? Don't see a comment, may have missed it. src/hotspot/os/aix/os_aix.cpp line 1131: > 1129: // If the libpath cannot be retrieved return an empty path > 1130: static const char* rtv_linkedin_libpath() { > 1131: static char buffer[4096]; This coding has some issues: - a generic char buffer is not a good idea. Forces you to do casts all over the place, and introduces alignment issues with unaligned char buffer. Which I assume is the reason for all the separate memcpy-into-structures below. I would just read into the structures directly. - you need to check the return codes for fread to make sure you read the number of bytes expected, lest you work with uninitialized memory and maybe to handle sporadic EINTR. - I don't get all the separate "SZ" macros. They must be equal to sizeof(structure), right, otherwise you get buffer overruns or work with uninitialized memory? Proposal: add a local wrapper function like this: template static bool my_checked_fread(FILE* f, T* out) { // read sizeof(T) from f. // Check return code. // Return bool if sizeof(T) bytes were read. e.g. in a very trivial form: int bytesread = fread(out, sizeof(T), 1, f); return bytesread == sizeof(T); } and use it in your code like this: struct xcoff64 the_xcoff64; struct scn64 the_scn64; struct ldr64 the_ldr64; if (!my_checked_fread(f, _xcoff64)) { assert? } ... if (!my_checked_fread(f, _ldr64) { .. handle error } src/hotspot/os/aix/os_aix.cpp line 1132: > 1130: static const char* rtv_linkedin_libpath() { > 1131: static char buffer[4096]; > 1132: static const char* libpath = 0; If your intent is to return an empty buffer if there is no contained libpath, I would just: static const char* libpath = ""; then you can always just return libpath. src/hotspot/os/aix/os_aix.cpp line 1135: > 1133: > 1134: if (libpath) > 1135: return libpath; { } src/hotspot/os/aix/os_aix.cpp line 1137: > 1135: return libpath; > 1136: > 1137: char pgmpath[32+1]; Will overflow if pid_t is 64bit. Give it a larger size; after all, you are giving buffer 4K above, so you are not overly concerned with saving stack space. src/hotspot/os/aix/os_aix.cpp line 1146: > 1144: fread(buffer, 1, FILHSZ_64 + _AOUTHSZ_EXEC_64, f); > 1145: > 1146: if (((struct filehdr*)buffer)->f_magic == U802TOCMAGIC ) { as stated above, I don't think this section is needed. src/hotspot/os/aix/os_aix.cpp line 1170: >
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]
On Tue, 5 Dec 2023 13:21:35 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> encapsulate everything in os::Aix::dlopen > > src/hotspot/os/aix/os_aix.cpp line 3133: > >> 3131: return nullptr; >> 3132: } >> 3133: // library not still loaded and still place in array, so load >> library > > s/still/yet No need to be this verbose either, especially since the comment is somewhat misleading. "create entry at end of table" implies that we have a dynamically growing table and allocate new entries. Proposal: "Library not yet loaded; load it, then store its handle in handle table". - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415605856
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v2]
On Mon, 4 Dec 2023 12:33:26 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > improve handling of nonexisting files src/hotspot/os/aix/os_aix.cpp line 203: > 201: constexpr int max_handletable = 1024; > 202: static int g_handletable_used = 0; > 203: static struct handletableentry g_handletable[max_handletable] = > {{0,0,0,0}}; style nits: - we usually write the * behind type, not before var name - `{{0,0}}` -> insert spaces src/hotspot/os/aix/os_aix.cpp line 1159: > 1157: result = ::dlopen(filename, dflags); > 1158: if (result != nullptr) { > 1159: assert(false, "dll_load: Could not stat() file %s, but dlopen() > worked; Have to improve stat()", filename); use assert(result != nullptr) and remove condition - PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1413843503 PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1413846111
Re: RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]
On Tue, 5 Dec 2023 12:11:46 GMT, Joachim Kern wrote: >> On AIX, repeated calls to dlopen referring to the same shared library may >> result in different, unique dl handles to be returned from libc. In that it >> differs from typical libc implementations that cache dl handles. >> >> This causes problems in the JVM with code that assumes equality of handles. >> One such problem is in the JVMTI agent handler. That problem was fixed with >> a local fix to said handler >> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this >> fix causes follow-up problems since it assumes that the file name passed to >> `os::dll_load()` is the file that has been opened. It prevents efficient, >> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. >> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it >> is a hack that causes other, more uglier hacks to follow (see discussion of >> https://github.com/openjdk/jdk/pull/16604). >> >> We propose a different, cleaner way of handling this: >> >> - Handle this entirely inside the AIX versions of os::dll_load and >> os::dll_unload. >> - Cache dl handles; repeated opening of a library should return the cached >> handle. >> - Increase handle-local ref counter on open, Decrease it on close >> - Make sure calls to os::dll_load are matched to os::dll_unload (See >> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)). >> >> This way we mimic dl handle equality as it is implemented on other >> platforms, and this works for all callers of os::dll_load. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > encapsulate everything in os::Aix::dlopen Excellent, this is how I have pictured a good solution. Very nice. A number of remarks, but nothing fundamental. src/hotspot/os/aix/os_aix.cpp line 1137: > 1135: if (ebuf != nullptr && ebuflen > 0) { > 1136: ::strncpy(ebuf, "dll_load: empty filename specified", ebuflen - > 1); > 1137: } Are there any cases where we don't hand in the error buffer? If so, I would just assert ebuf and ebuflen. No need for this kind of flexibility. src/hotspot/os/aix/os_aix.cpp line 3051: > 3049: > 3050: // Simulate the library search algorithm of dlopen() (in os::dll_load) > 3051: int os::Aix::stat64x_via_LIBPATH(const char* path, struct stat64x* > stat) { - no need to export this, make it filescope static - please return bool, with false = error - please rename it to something like "search_file_in_LIBPATH" src/hotspot/os/aix/os_aix.cpp line 3055: > 3053: return -1; > 3054: > 3055: char *path2 = strdup (path); Please use os::strdup and os::free. If you really intent to use the plain libc versions, use `::strdup` and `::free` to make sure - and indicate to code readers - you use the global libc variants. src/hotspot/os/aix/os_aix.cpp line 3059: > 3057: int idx = strlen(path2) - 1; > 3058: if (path2[idx] == ')') { > 3059: while (path2[idx] != '(' && idx > 0) idx--; ? Why not `strrchr()`? src/hotspot/os/aix/os_aix.cpp line 3067: > 3065: if (path2[0] == '/' || > 3066: (path2[0] == '.' && (path2[1] == '/' || > 3067: (path2[1] == '.' && path2[2] == '/' { This complexity is not needed, nor is it sufficient, since it does not handle relative paths ("mydirectory/hallo.so") https://www.ibm.com/docs/en/aix/7.1?topic=d-dlopen-subroutine "If FilePath contains a slash character, FilePath is used directly, and no directories are searched. " So, just scan for a '/' - if you find one, its a path to be opened directly: const bool use_as_filepath = strchr(path2, '/'); src/hotspot/os/aix/os_aix.cpp line 3085: > 3083: strcpy(libpath, env); > 3084: for (token = strtok_r(libpath, ":", ); token != nullptr; > token = strtok_r(nullptr, ":", )) { > 3085: sprintf(combined, "%s/%s", token, path2); You can save a lot of pain and manual labor by using `stringStream` here. stringStream combined; combined.print("%s/%s", token, path2); const char* combined_path_string = combined.base(); no need for manual allocation and byte counting. src/hotspot/os/aix/os_aix.cpp line 3099: > 3097: // filled by os::dll_load(). This way we mimic dl handle equality for a > library > 3098: // opened a second time, as it is implemented on other platforms. > 3099: void* os::Aix::dlopen(const char* filename, int Flags) { Does not need to be exported, nor does os::AIX::dlclose. Make file scope static. See my remarks in os_posix.cpp. src/hotspot/os/aix/os_aix.cpp line 3103: > 3101: struct stat64x libstat; > 3102: > 3103: if (os::Aix::stat64x_via_LIBPATH(filename, )) { Please return bool, not unix int -1, this hurts my brain :-) src/hotspot/os/aix/os_aix.cpp line 3108: > 3106: if (result != nullptr) { > 3107: assert(false, "dll_load: Could not stat() file %s, but dlopen() > worked; Have to improve stat()", filename); > 3108:
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11]
On Mon, 4 Dec 2023 00:41:23 GMT, David Holmes wrote: > From the blog: > > > Yes! The methods are being deallocated for a class loader that is still > > alive. > > Okay so why does this happen and is it a reasonable thing to be happening? On > the surface it sounds wrong to deallocate anything associated with a live > classloader. This sounds odd to me to. I know that we deallocate the old *byte code* of re-transformed classes; but `Method*` ? - PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1838413238
Re: RFR: 8320935: Move CDS config initialization code to cdsConfig.cpp [v2]
On Sat, 2 Dec 2023 00:38:58 GMT, Ioi Lam wrote: >> This is a simple clean up that moves the code for initializing the CDS >> config states from arguments.cpp to cdsConfig.cpp >> >> I renamed a few functions, but otherwise the code is unchanged. >> >> - `get_default_shared_archive_path()` -> `default_archive_path()` >> - `GetSharedArchivePath()` -> `static_archive_path()` >> - `GetSharedDynamicArchivePath()` -> `dynamic_archive_path()` >> >> There's also less `#if INCLUDE_CDS` since the entire cdsConfig.cpp file is >> compiled only if CDS is enabled. > > Ioi Lam has updated the pull request incrementally with one additional commit > since the last revision: > > fixed indentation Looks good. Did not find any functional difference to the original code. - Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16868#pullrequestreview-1760796058
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11]
On Wed, 29 Nov 2023 11:49:31 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method >> redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` >> instance. Once that method gets redefined, the `jmethodID` is updated to >> point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original >> `jmethodID` will be redirected to the latest `Method` version and at the >> same time the 'previous' `Method` version will receive a new `jmethodID` >> pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or >> `GetAllStackTraces` JVMTI calls while this previous `Method` version is >> still on stack we will have the corresponding frame identified by a >> `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for >> cleanup at what time all contained `Method` instances. The cleanup process >> will not perform the `jmethodID` pointer maintenance and we will end up with >> pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to >> `ClassLoaderData` instance and all relevant `jmethodID`s will get >> batch-updated when the class loader is being released and all its classes >> are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being >> deallocate the associated `jmethodID` (if any) must not point to the >> deallocated instance once we are finished. Unfortunately, we can not just >> update the `jmethodID` values in bulk when purging an old class version - >> the per `InstanceKlass` jmethodID cache is present only for the main class >> version and contains `jmethodID` values for both the old and current method >> versions. >> >> ~Therefore we need to perform `jmethodID` lookup when we are about to >> deallocate a `Method` instance and clean up the pointer only if that >> `jmethodID` is pointing to the `Method` instance which is being deallocated.~ >> >> Therefore, we need to perform `jmethodID` lookup for each method in an old >> class version that is getting purged, and null out the pointer of that >> `jmethodID` to break the link from `jmethodID` to the method instance that >> is about to get deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my >> blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one > additional commit since the last revision: > > Restrict cleanup to obsolete methods only I won't be able to review this this week, too snowed in atm. I can take a look next week. We can always just revert the change if needed. Thinking about Skara, I think as long as we have this confusing mixture of rules (hotspot wants 2 reviewers that are Reviewer/Committer, but some jdk libs only want one, but then you need two for desktop I think otherwise Phil gets angry) - we should hard-code the 2-reviewer rule into skara as default since it affects the lion's share of all changes. - PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1835474161
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Tue, 28 Nov 2023 11:27:33 GMT, suchismith1993 wrote: > > > > @tstuefe 3rd parameter to pass the either of 2 things: > > 1. The JvmTiAgent object "agent", so that after shifting the > save_library_signature to os_aix,we can still access the agent object.-> For > this i tried importing jvm/prims header file, but i get segmentation faults > during build . Not sure if i am doing it the right way. > > 2. Pass a character buffer(and not const char*) where we copy the > modified filename back to it and then use it in jvmAgent. code. Does not sound really appealing tbh. We pile one hack atop of another. Please synchronize with @JoKern65 at SAP. He will rewrite the JVMTI handler code, which will make this point moot. See https://bugs.openjdk.org/browse/JDK-8320890. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829776298
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Mon, 27 Nov 2023 15:44:58 GMT, suchismith1993 wrote: > > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if > > > > the condition fails for .so files, because i have to reload it again > > > > and check if the .a exists. In the shared code i had repeat less number > > > > of lines i believe. Do you suggest moving lines 1132 to 1139 to another > > > > function then ? > > > > > > > > > @tstuefe Any suggestion on this ? > > > > > > ``` > > --- a/src/hotspot/os/aix/os_aix.cpp > > +++ b/src/hotspot/os/aix/os_aix.cpp > > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, > > char* buf, > >return true; > > } > > > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) > > { > > > >log_info(os)("attempting shared library load of %s", filename); > > > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, > > int ebuflen) { > >return nullptr; > > } > > > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > + > > + void* result = nullptr; > > + > > + // First try using *.so suffix; failing that, retry with *.a suffix. > > + const size_t len = strlen(filename); > > + constexpr size_t safety = 3 + 1; > > + constexpr size_t bufsize = len + safety; > > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > > + strcpy(buf, filename); > > + char* const dot = strrchr(buf, '.'); > > + > > + assert(dot != nullptr, "Attempting to load a shared object without > > extension? %s", filename); > > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > > + "Attempting to load a shared object that is neither *.so nor *.a", > > filename); > > + > > + sprintf(dot, ".so"); > > + result = dll_load_inner(buf, ebuf, ebuflen); > > + > > + if (result == nullptr) { > > +sprintf(dot, ".a"); > > +result = dll_load_inner(buf, ebuf, ebuflen); > > + } > > + > > + FREE_C_HEAP_ARRAY(char, buf); > > + > > + return result; > > +} > > + > > ``` > > @tstuefe as discussed with @TheRealMDoerr do you think using default argument > will help ? Either we pass agent object as 3rd parameter or an empty > character buffer(and not const chat*) which would be spcifically used to copy > the alternate filename to it using strcpy so that it is reflected in the > jvmagent code ? A third parameter for what? - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829602040
Re: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5]
On Mon, 27 Nov 2023 13:23:42 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional >> commit since the last revision: >> >> adopt types > > This now causes problems with > > https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 > > since it removes the possibility of silently alternating the file path inside > os::dll_load, which would be the preferred way for AIX to handle *.a shared > objects. So this causes even more ifdef AIX to bloom up everywhere. > > A much better solution would have been to mimic stable-handle behavior inside > the AIX version of `os::dll_load`. > > Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple > mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) > tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp > without any changes to shared coding, and would have provided complete > coverage for all users of dll_load. > @tstuefe: Hi Thomas, I'm not sure if I got it. We can make (inode, devid) to > a hash, which replaces the dlhandle on return of os::dlload. This hash would > of course be the same if the same library is loaded twice. But I do not know > how to handle the two real dlhandles. Why do you need two dlhandles? A handle returned from dlopen should be valid for the whole process. If you cache that in a hashmap, and for the second caller of os::dll_load() return the cached variant, this should work, no? - PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1828019003
Re: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5]
On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , >> the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in >> [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try >> of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) >> was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of >> the root cause. >> The root cause is, that dlopen() on AIX returns different handles every >> time, even if you load a library twice. There is no official AIX API >> available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and >> st_inode. After a dlopen() the stat64x() API is called additionally to get >> this information which is then stored parallel to the library handle in the >> jvmtiAgent. For AIX we then can compare these values instead of the library >> handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional > commit since the last revision: > > adopt types This now causes problems with https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 since it removes the possibility of silently alternating the file path inside os::dll_load, which would be the preferred way for AIX to handle *.a shared objects. So this causes even more ifdef AIX to bloom up everywhere. A much better solution would have been to mimic stable-handle behavior inside the AIX version of `os::dll_load`. Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp without any changes to shared coding, and would have provided complete coverage for all users of dll_load. - PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1827826924
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Fri, 24 Nov 2023 11:45:25 GMT, suchismith1993 wrote: > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the > > condition fails for .so files, because i have to reload it again and check > > if the .a exists. In the shared code i had repeat less number of lines i > > believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > @tstuefe Any suggestion on this ? --- a/src/hotspot/os/aix/os_aix.cpp +++ b/src/hotspot/os/aix/os_aix.cpp @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, return true; } -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { log_info(os)("attempting shared library load of %s", filename); @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { return nullptr; } +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { + + void* result = nullptr; + + // First try using *.so suffix; failing that, retry with *.a suffix. + const size_t len = strlen(filename); + constexpr size_t safety = 3 + 1; + constexpr size_t bufsize = len + safety; + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); + strcpy(buf, filename); + char* const dot = strrchr(buf, '.'); + + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, + "Attempting to load a shared object that is neither *.so nor *.a", filename); + + sprintf(dot, ".so"); + result = dll_load_inner(buf, ebuf, ebuflen); + + if (result == nullptr) { +sprintf(dot, ".a"); +result = dll_load_inner(buf, ebuf, ebuflen); + } + + FREE_C_HEAP_ARRAY(char, buf); + + return result; +} + - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1825721906
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8]
On Thu, 23 Nov 2023 16:42:35 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method >> redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` >> instance. Once that method gets redefined, the `jmethodID` is updated to >> point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original >> `jmethodID` will be redirected to the latest `Method` version and at the >> same time the 'previous' `Method` version will receive a new `jmethodID` >> pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or >> `GetAllStackTraces` JVMTI calls while this previous `Method` version is >> still on stack we will have the corresponding frame identified by a >> `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for >> cleanup at what time all contained `Method` instances. The cleanup process >> will not perform the `jmethodID` pointer maintenance and we will end up with >> pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to >> `ClassLoaderData` instance and all relevant `jmethodID`s will get >> batch-updated when the class loader is being released and all its classes >> are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being >> deallocate the associated `jmethodID` (if any) must not point to the >> deallocated instance once we are finished. Unfortunately, we can not just >> update the `jmethodID` values in bulk when purging an old class version - >> the per `InstanceKlass` jmethodID cache is present only for the main class >> version and contains `jmethodID` values for both the old and current method >> versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to >> deallocate a `Method` instance and clean up the pointer only if that >> `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my >> blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one > additional commit since the last revision: > > Reinstate mistakenly deleted comment src/hotspot/share/oops/instanceKlass.hpp line 1087: > 1085: // - We can not use the jmethodID cache associated with klass > directly because the 'previous' versions > 1086: // do not have the jmethodID cache filled in. Instead, we need to > lookup jmethodID for each method and this > 1087: // is expensive - O(n) for one jmethodID lookup. For all > contained methods it is O(n^2). The comment is helpful, but its an implementation comment, and I would move this part to the implementation. Here, what matters to know is that this function nulls out jmethodIDs for all methods in this IK. The comment also refers to class transformation specifically, I'd mention that so that readers get a context. Do I understand correctly that this comment tries to explain why we - instead of just iterating IK->_methods_jmethod_ids directly - we iterate all methods of this IK and then lookup the associated jmethodID in IK->_methods_jmethod_ids, right? I wondered about that but IIUC the pointers inside IK->_methods_jmethod_ids may refer to jmethodID slots that have been reused for different methods? - PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403995913
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Thu, 23 Nov 2023 17:05:29 GMT, suchismith1993 wrote: > > I'm not a big fan of this approach. We accumulate more and more "#ifdef > > AIX" in shared code because of many recent AIX additions. No other platform > > has such a large ifdef footprint in shared code. > > I argue that all of this should be handled inside os_aix.cpp and not leak > > out into the external space: > > If .a is a valid shared object format on AIX, this should be handled in > > `os::dll_load()`, and be done for all shared objects. If not, why do we try > > to load a static archive via dlload in this case but not in other cases? > > _If_ this is needed in shared code, the string replacement function should > > be a generic utility function for all platforms, and it should be tested > > with a small gtest. A gtest would have likely uncovered the buffer overflow > > too. > > So i tried to check how to move the code to os_aix file. A few problems is > see : > > 1. When i have to implemented the logic at dll_load function, i would have to > repeat a lot of code after dlopen, i.e i have to call dlopen again for .so > files and hence i have to copy the logic again for it. > 2. Currently using function before dll_load,in the shared code makes this a > bit easier. >I have an alternate suggestion . >Shall we declare the utlity function as part of os ? and implement it > platform wise. Not without any need. If this is an AIX specific issue, it should be handed in os::dll_load on AIX. >In that way we just keep the actual implentation and aix and in windows > and linux we keep it empty. >So that way we can just call the utility function in shared code and it > wouldnt affect other platform and will run the usecase for AIX. >If that is not acceptable, then is there a better way to avoid repeating > the dlopen again in os_aix file ? I don't understand the problem. What is preventing you from using a file local scope utility function inside os::dll_load? - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1824787258
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2]
On Thu, 23 Nov 2023 14:15:54 GMT, Thomas Stuefe wrote: >> @dholmes-ora >>> Can't we just check method->method_holder() for null rather than passing in >>> a parameter like this? >> >> I have removed the argument and I am now performing the check for >> `method_holder() != nullptr` as recommended. The code is a bit simpler and >> the cost of resolving the method holder for each method is probably quite >> low so we should be ok here. > > @jbachorik You are aware that this fix only works for some uncommon corner > cases, right? > > It only works if the Method is explicitly deallocated. The vast bulk of > Method aren't. Method, as a Metaspace object, is released in bulk when the > class is unloaded. The `::deallocate` path you fixed up - that eventually > ends up in `MetaspaceArena::deallocate()` - is a rare case that only gets > invoked if > - a class cannot be loaded but parts of it have already been loaded into > Metaspace. > - a class gets transformed > > In case the class gets unloaded via conventional means, your fix won't get > invoked (nor should it; releasing in bulk without having to care for > individual allocations is the point of Metaspace). > @tstuefe > > > In case the class gets unloaded via conventional means, your fix won't get > > invoked (nor should it; releasing in bulk without having to care for > > individual allocations is the point of Metaspace). > > Yes. This is intended. The deallocation path via Metaspace is fine. It is > just the code that is purging previous versions of a redefined class that has > this bug. I see it now. On class unloading, we reset the jmethodID "master table" linked list nodes, set all slots to null, but don't delete them to keep outside users from crashing. And we release the IK itself and before we do that free the methodID cache in IK. So we are covered. I thought this was about class unloading, sorry for not reading the description carefully. - PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1824662450
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6]
On Thu, 23 Nov 2023 13:37:41 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method >> redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` >> instance. Once that method gets redefined, the `jmethodID` is updated to >> point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original >> `jmethodID` will be redirected to the latest `Method` version and at the >> same time the 'previous' `Method` version will receive a new `jmethodID` >> pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or >> `GetAllStackTraces` JVMTI calls while this previous `Method` version is >> still on stack we will have the corresponding frame identified by a >> `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for >> cleanup at what time all contained `Method` instances. The cleanup process >> will not perform the `jmethodID` pointer maintenance and we will end up with >> pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to >> `ClassLoaderData` instance and all relevant `jmethodID`s will get >> batch-updated when the class loader is being released and all its classes >> are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being >> deallocate the associated `jmethodID` (if any) must not point to the >> deallocated instance once we are finished. Unfortunately, we can not just >> update the `jmethodID` values in bulk when purging an old class version - >> the per `InstanceKlass` jmethodID cache is present only for the main class >> version and contains `jmethodID` values for both the old and current method >> versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to >> deallocate a `Method` instance and clean up the pointer only if that >> `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my >> blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one > additional commit since the last revision: > > Add exhaustive check for method holder availability (1) src/hotspot/share/oops/instanceKlass.cpp line 541: > 539: assert (!method->on_stack(), "shouldn't be called with methods on > stack"); > 540: // Do the pointer maintenance before releasing the metadata > 541: method->clear_jmethod_id(); IIUC this is O(n^2) over number of methods. Seeing that this is a workaround for a special case (an app that does a lot of retransforms *and* uses async profiler), I'd opt for making it conditional. - PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403449172
Re: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2]
On Thu, 23 Nov 2023 08:44:08 GMT, Jaroslav Bachorik wrote: >> Jaroslav Bachorik has updated the pull request incrementally with three >> additional commits since the last revision: >> >> - Clean up imports >> - Simplify Method::clear_jmethod_id() >> - Use correct copyrights > > @dholmes-ora >> Can't we just check method->method_holder() for null rather than passing in >> a parameter like this? > > I have removed the argument and I am now performing the check for > `method_holder() != nullptr` as recommended. The code is a bit simpler and > the cost of resolving the method holder for each method is probably quite low > so we should be ok here. @jbachorik You are aware that this fix only works for some uncommon corner cases, right? It only works if the Method is explicitly deallocated. The vast bulk of Method aren't. Method, as a Metaspace object, is released in bulk when the class is unloaded. The `::deallocate` path you fixed up - that eventually ends up in `MetaspaceArena::deallocate()` - is a rare case that only gets invoked if - a class cannot be loaded but parts of it have already been loaded into Metaspace. - a class gets transformed In case the class gets unloaded via conventional means, your fix won't get invoked (nor should it; releasing in bulk without having to care for individual allocations is the point of Metaspace). - PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1824509686
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Thu, 23 Nov 2023 05:55:28 GMT, Thomas Stuefe wrote: >> suchismith1993 has updated the pull request incrementally with one >> additional commit since the last revision: >> >> change macro position > > src/hotspot/os/aix/os_aix.cpp line 3069: > >> 3067: while (end > 0 && buffer[end] != '.') { >> 3068: end = end - 1; >> 3069: } > > Use strrchr. Pls handle the case where the string contains no dot. - PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941387
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> J2SE agent does not start and throws error when it tries to find the shared >> library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load >> fails.It fails to identify the shared library ibm_16_am.a shared archive >> file on AIX. >> Hence we are providing a function which will additionally search for .a file >> on AIX ,when the search for .so file fails. > > suchismith1993 has updated the pull request incrementally with one additional > commit since the last revision: > > change macro position I'm not a big fan of this approach. We accumulate more and more "#ifdef AIX" in shared code because of many recent AIX additions. No other platform has such a large ifdef footprint in shared code. I argue that all of this should be handled inside os_aix.cpp and not leak out into the external space: If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? *If* this is needed in shared code, the string replacement function should be a generic utility function for all platforms, and it should be tested with a small gtest. A gtest would have likely uncovered the buffer overflow too. src/hotspot/os/aix/os_aix.cpp line 3065: > 3063: //Replaces provided path with alternate path for the given file,if it > doesnt exist. > 3064: //For AIX,this replaces .so with .a. > 3065: void os::Aix::mapAlternateName(char* buffer, const char *extension) { The documentation is wrong: // Replaces the specified path with an alternative path for the given file if the original path doesn't exist It does no such thing, it replaces the extension unconditionally. The comment sounds like it does a file system check. That does not happen here. The whole function is not well named - "map alternate name" does not really tell me anything, I need to look at the implementation and the caller to understand what it is doing. There is no mapping here, this is just a string utility function. The function should not modify the original buffer but instead assemble a copy. That is the conventional way to do these things. You can work with immutable strings as input, e.g. literals, and don't risk buffer overflows. All of this should be handled inside os_aix.cpp; see my other comment. This should not live in the external os::aix interface, since it has nothing to do with AIX. *If* this is needed in generic code, which I don't think, then this should be made generic utility API, available on all platforms, and with a small gtest. But I think all of this should be confined to os_aix.cpp. Proposal for a clearer name, comment, and pseudocode // Given a filename with an extension, return a new string containing the filename with the new extension. // New string is allocated in resource area. static char* replace_extension_in_filename(const char* filename, const char* new_extension) { - allocate buffer in RA - assemble new path by contacting old path - old extension + new extension - return new path } src/hotspot/os/aix/os_aix.cpp line 3069: > 3067: while (end > 0 && buffer[end] != '.') { > 3068: end = end - 1; > 3069: } Use strrchr. src/hotspot/os/aix/os_aix.cpp line 3072: > 3070: buffer[end] = '\0'; > 3071: strcat(buffer, extension); > 3072: } This is a buffer overrun waiting to happen if replacement is larger than original extension. - Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1745743102 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402944936 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941240 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941730
Re: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2]
On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> JDK-8320005 : Native library suffix impact on hotspot code in AIX > > suchismith1993 has updated the pull request incrementally with one additional > commit since the last revision: > > change macro position Hi, is this patch meant for review already? If yes, could you please describe the problem you fix, and how you fix it? If no, I suggest working on it in draft state till its ready for review. - PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1823105203
Integrated: JDK-8318671: Potential uninitialized uintx value after JDK-8317683
On Tue, 24 Oct 2023 07:08:07 GMT, Thomas Stuefe wrote: > When using 'MemStat' CompileCommand, we accidentally register the command if > an invalid suboption had been specified. Fixed, added regression test > (verified). This pull request has now been integrated. Changeset: 2e34a2eb Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/2e34a2ebf0f14043b129461b0397495e7e75a38b Stats: 105 lines in 3 files changed: 76 ins; 17 del; 12 mod 8318671: Potential uninitialized uintx value after JDK-8317683 Reviewed-by: thartmann, shade - PR: https://git.openjdk.org/jdk/pull/16335
Re: RFR: JDK-8318671: Potential uninitialized uintx value after JDK-8317683 [v6]
On Wed, 15 Nov 2023 06:22:58 GMT, Thomas Stuefe wrote: >> When using 'MemStat' CompileCommand, we accidentally register the command if >> an invalid suboption had been specified. Fixed, added regression test >> (verified). > > Thomas Stuefe has updated the pull request incrementally with one additional > commit since the last revision: > > feedback shade MacOS error unrelated. - PR Comment: https://git.openjdk.org/jdk/pull/16335#issuecomment-1812132637
Re: RFR: JDK-8318671: Potential uninitialized uintx value after JDK-8317683 [v6]
> When using 'MemStat' CompileCommand, we accidentally register the command if > an invalid suboption had been specified. Fixed, added regression test > (verified). Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback shade - Changes: - all: https://git.openjdk.org/jdk/pull/16335/files - new: https://git.openjdk.org/jdk/pull/16335/files/7abf7cc4..591ce6e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16335=05 - incr: https://webrevs.openjdk.org/?repo=jdk=16335=04-05 Stats: 27 lines in 1 file changed: 9 ins; 8 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16335.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16335/head:pull/16335 PR: https://git.openjdk.org/jdk/pull/16335
Re: RFR: JDK-8318671: Potential uninitialized uintx value after JDK-8317683 [v5]
On Tue, 14 Nov 2023 15:03:42 GMT, Aleksey Shipilev wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' into >> JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 >> - Fix Windows build >> - remove tab >> - Make patch more palatable >> - Merge branch 'openjdk:master' into >> JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 >> - JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 > > src/hotspot/share/compiler/compilerOracle.cpp line 678: > >> 676: // Parse an uintx-based option value. Also takes care of parsing enum >> values for options that are enums. >> 677: // Returns true if ok, false if the value could not be parsed. >> 678: static bool parseUintxValue(enum CompileCommand option, const char* >> line, uintx& value, int& bytes_read) { > > It is honestly weird to see `parse***Uintx***Value` dealing with enums, and > be specialized for `MemStat`. Can you reflow this to match how `MemLimit` > does it? > https://github.com/openjdk/jdk/blob/7bb1999c51cdfeb020047e1094229fda1ec5a99d/src/hotspot/share/compiler/compilerOracle.cpp#L702 Okay, rewritten in the style of https://github.com/openjdk/jdk/pull/16631. Retested too. Let's get this out of the door, please. - PR Review Comment: https://git.openjdk.org/jdk/pull/16335#discussion_r1393696933
Re: RFR: JDK-8318671: Potential uninitialized uintx value after JDK-8317683 [v5]
On Mon, 13 Nov 2023 13:33:26 GMT, Thomas Stuefe wrote: >> When using 'MemStat' CompileCommand, we accidentally register the command if >> an invalid suboption had been specified. Fixed, added regression test >> (verified). > > Thomas Stuefe has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into > JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 > - Fix Windows build > - remove tab > - Make patch more palatable > - Merge branch 'openjdk:master' into > JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 > - JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 @shipilev maybe, as bug owner? - PR Comment: https://git.openjdk.org/jdk/pull/16335#issuecomment-1809719133
Re: RFR: JDK-8318671: Potential uninitialized uintx value after JDK-8317683 [v5]
> When using 'MemStat' CompileCommand, we accidentally register the command if > an invalid suboption had been specified. Fixed, added regression test > (verified). Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 - Fix Windows build - remove tab - Make patch more palatable - Merge branch 'openjdk:master' into JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 - JDK-8318671-Potential-uninitialized-uintx-value-after-JDK-8317683 - Changes: https://git.openjdk.org/jdk/pull/16335/files Webrev: https://webrevs.openjdk.org/?repo=jdk=16335=04 Stats: 110 lines in 3 files changed: 79 ins; 21 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16335.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16335/head:pull/16335 PR: https://git.openjdk.org/jdk/pull/16335
Re: RFR: JDK-8318671: Potential uninitialized uintx value after JDK-8317683 [v4]
On Mon, 13 Nov 2023 11:51:30 GMT, Tobias Hartmann wrote: > I think this is ready for integration given that both @dholmes-ora and > @jdksjolen are okay with it. Well, they did not approve yet; @jdksjolen or @dholmes-ora, if you are happy with this, could you hit the big green button please? - PR Comment: https://git.openjdk.org/jdk/pull/16335#issuecomment-1808169293