On Fri, 28 Jan 2022 15:07:56 GMT, Tyler Steele <[email protected]> wrote:
>> Just in time for the holidays I have completed an implementation of the JFR
>> functionality for AIX. As a side note, this is my first submission to
>> OpenJDK 👋
>>
>> ### Implementation notes and alternatives considered
>>
>> After modifying the build system to allow the --enable-jvm-feature-jfr to
>> work on AIX, my task was to implement the interfaces from os_perf.hpp. The
>> os_perf_aix.cpp implementation that existed was, I believe, a copy of the
>> Linux implementation. A review of the code in that file showed that
>> NetworkInterface, CPUPerformanceInterface, and SystemProcessInterface would
>> require modification to work on AIX. Using the Linux implementation as a
>> guide, I initially expected to use files from the aix procfs like
>> /proc/<pid>/psinfo, and /proc/<pid>/status in a manner similar to the Linux
>> implementation. However, I ended up using libperfstat for all functionality
>> required by the interfaces.
>>
>> ### Testing
>>
>> Testing for JFR seems to be quite light in the repo both before and after
>> this change. In addition to performing manual testing, I have added some
>> basic sanity checks that ensure events can be created and committed (using
>> jtreg), and performs some basic checks on the results of the interface
>> member functions (using gtest).
>>
>> ### More notes
>>
>> I've sent an email to the JFR group about a TOC overflow warning I
>> encountered while building (for the release server target). I believe the
>> fix is to pass -qpic=large when using the xlc toolchain, but my
>> modifications to flags-cflags.m4 have not been successful in removing this
>> warning.
>
> Tyler Steele has updated the pull request incrementally with two additional
> commits since the last revision:
>
> - Changes macoss -> macosx in problem list
> - Refactors loadlib_aix: Removes redundant c++ class
Hi Tyler,
looks better. See comments inline.
src/hotspot/os/aix/loadlib_aix.cpp line 116:
> 114: // entry_t* next;
> 115: // loaded_module_t info;
> 116: // };
please remove
src/hotspot/os/aix/loadlib_aix.cpp line 326:
> 324: // Helper for copying loaded module info to external callers.
> 325: // To avoid dangling pointers 'next' is set to null
> 326: static bool copy_lm_to_external(const loaded_module_t* const from,
> loaded_module_t* to) {
So, just to understand, the point of this function is to handle to==NULL (from
is never NULL) and to set next to NULL?
I'd probably keep doing this inline in the two places where this happens, but I
leave it up to you. If you think this is cleaner, its fine too.
src/hotspot/os/aix/loadlib_aix.hpp line 37:
> 35:
> 36: #include "misc_aix.hpp"
> 37: #include "runtime/os.hpp"
Bikeshed: I would like to have avoided including os.hpp (its a monster) - hence
my recommendation about a separate callback or closure definition. But it is
not super important, this is fine too.
src/hotspot/os/aix/os_perf_aix.cpp line 72:
> 70:
> 71: static bool initialize_libperfstat() {
> 72: static bool is_libperfstat_loaded = false;
I don't think this is needed. libperfstat is initialized as part of os::init in
os_aix.cpp.
https://github.com/openjdk/jdk/blob/d366d15d67a08833d93a5806edef8145cb7803e5/src/hotspot/os/aix/os_aix.cpp#L2999-L3008
https://github.com/openjdk/jdk/blob/d366d15d67a08833d93a5806edef8145cb7803e5/src/hotspot/os/aix/os_aix.cpp#L2329-L2333
Should be already active at this point. Its functions are stubbed out in case
the library cannot be loaded, so you should be able to call them in here,
without an additional init check. You just have to handle the errors on the
individual API calls, as if the real APIs returned errors. But you do this
already I think.
Long term, if you think libperfstat should be always there, we can simplify
things and treat libperfstat load errors as hard VM errors in release builds
too. We already assert in `os::Aix::initialize_libperfstat()`. (Note: "assert"
fires in debug builds (ASSERT macro), and "guarantee" fires in all builds).
src/hotspot/os/aix/os_perf_aix.cpp line 495:
> 493: char* name = NEW_C_HEAP_ARRAY(char, IDENTIFIER_LENGTH,
> mtInternal);
> 494: char* exe_name = NEW_C_HEAP_ARRAY(char, PRFNSZ, mtInternal);
> 495: char* cmd_line = NEW_C_HEAP_ARRAY(char, PRARGSZ, mtInternal);
So the contract with SystemProcess is that its ctor arguments need to be C-heap
allocated, and releases them itself? ... okay. May be worthwhile to clean up at
some point, or at least comment.
src/hotspot/os/aix/os_perf_aix.cpp line 611:
> 609:
> 610: // calling perfstat_<subsystem>(NULL, NULL, _, 0) returns number of
> available records
> 611: if ((n_records = libperfstat::perfstat_netinterface(NULL, NULL,
> sizeof(perfstat_netinterface_t), 0)) <= 0) {
is n=0 an error? Could there conceivably be zero interfaces?
src/hotspot/os/aix/os_perf_aix.cpp line 631:
> 629:
> net_stats[i].ibytes,
> 630:
> net_stats[i].obytes,
> 631:
> *network_interfaces);
When looking at NetworkInterface, I saw that it copies the name into its own
resource-area allocated string in the constructor. This is all over the place,
SystemProcess assumes C-heap storage, NetworkInterface uses RA... :(
The problem now is that your ResourceMark above may also destroy the internal
string in NetworkInterface when the Stack is unwound beyond this function.
Side note: this is what I don't like about how hotspot uses RA. Its often not
clear which data are RA allocated without looking at the code. In Objective-C
(at least how its used on Mac) they had a clear naming convention for methods
that returned storage from the AutoReleasePool.
The problem is also that with ResourceMarks cutting off your data, you may not
see the problem until much later. Usually you only see immediate problems if
the underlying arena (a chained list of malloced slabs) added a new slab under
your ResourceMark, which then gets free'd when the ResourceMark goes out of
scope.
How to fix this:
1) revert to NEW_C_HEAP_ARRAY for your records array here, with appropriate
cleanup, and remove ResourceMark
3) Change NetworkInterface class to use C-heap instead.
I'd opt for (1) since it avoids discussions about changing shared code. I
opened https://bugs.openjdk.java.net/browse/JDK-8280912 to change
NetworkInterface, but don't wait for me.
-------------
Changes requested by stuefe (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/6885