On Tue, 5 Dec 2023 12:11:46 GMT, Joachim Kern <jk...@openjdk.org> wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may 
>> result in different, unique dl handles to be returned from libc. In that it 
>> differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. 
>> One such problem is in the JVMTI agent handler. That problem was fixed with 
>> a local fix to said handler 
>> ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this 
>> fix causes follow-up problems since it assumes that the file name passed to 
>> `os::dll_load()` is the file that has been opened. It prevents efficient, 
>> os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. 
>> See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it 
>> is a hack that causes other, more uglier hacks to follow (see discussion of 
>> https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and 
>> os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached 
>> handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See 
>> [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other 
>> platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   encapsulate everything in os::Aix::dlopen

Excellent, this is how I have pictured a good solution. Very nice.

A number of remarks, but nothing fundamental.

src/hotspot/os/aix/os_aix.cpp line 1137:

> 1135:     if (ebuf != nullptr && ebuflen > 0) {
> 1136:       ::strncpy(ebuf, "dll_load: empty filename specified", ebuflen - 
> 1);
> 1137:     }

Are there any cases where we don't hand in the error buffer? If so, I would 
just assert ebuf and ebuflen. No need for this kind of flexibility.

src/hotspot/os/aix/os_aix.cpp line 3051:

> 3049: 
> 3050: // Simulate the library search algorithm of dlopen() (in os::dll_load)
> 3051: int os::Aix::stat64x_via_LIBPATH(const char* path, struct stat64x* 
> stat) {

- no need to export this, make it filescope static
- please return bool, with false = error
- please rename it to something like "search_file_in_LIBPATH"

src/hotspot/os/aix/os_aix.cpp line 3055:

> 3053:     return -1;
> 3054: 
> 3055:   char *path2 = strdup (path);

Please use os::strdup and os::free. If you really intent to use the plain libc 
versions, use `::strdup` and `::free` to make sure - and indicate to code 
readers - you use the global libc variants.

src/hotspot/os/aix/os_aix.cpp line 3059:

> 3057:   int idx = strlen(path2) - 1;
> 3058:   if (path2[idx] == ')') {
> 3059:     while (path2[idx] != '(' && idx > 0) idx--;

? Why not `strrchr()`?

src/hotspot/os/aix/os_aix.cpp line 3067:

> 3065:   if (path2[0] == '/' ||
> 3066:       (path2[0] == '.' && (path2[1] == '/' ||
> 3067:                           (path2[1] == '.' && path2[2] == '/')))) {

This complexity is not needed, nor is it sufficient, since it does not handle 
relative paths ("mydirectory/hallo.so")

https://www.ibm.com/docs/en/aix/7.1?topic=d-dlopen-subroutine

"If FilePath contains a slash character, FilePath is used directly, and no 
directories are searched. "

So, just scan for a '/' - if you find one, its a path to be opened directly:


const bool use_as_filepath = strchr(path2, '/');

src/hotspot/os/aix/os_aix.cpp line 3085:

> 3083:   strcpy(libpath, env);
> 3084:   for (token = strtok_r(libpath, ":", &saveptr); token != nullptr; 
> token = strtok_r(nullptr, ":", &saveptr)) {
> 3085:     sprintf(combined, "%s/%s", token, path2);

You can save a lot of pain and manual labor by using `stringStream` here. 


stringStream combined;
combined.print("%s/%s", token, path2);
const char* combined_path_string = combined.base();

no need for manual allocation and byte counting.

src/hotspot/os/aix/os_aix.cpp line 3099:

> 3097: // filled by os::dll_load(). This way we mimic dl handle equality for a 
> library
> 3098: // opened a second time, as it is implemented on other platforms.
> 3099: void* os::Aix::dlopen(const char* filename, int Flags) {

Does not need to be exported, nor does os::AIX::dlclose. Make file scope 
static. See my remarks in os_posix.cpp.

src/hotspot/os/aix/os_aix.cpp line 3103:

> 3101:   struct stat64x libstat;
> 3102: 
> 3103:   if (os::Aix::stat64x_via_LIBPATH(filename, &libstat)) {

Please return bool, not unix int -1, this hurts my brain :-)

src/hotspot/os/aix/os_aix.cpp line 3108:

> 3106:     if (result != nullptr) {
> 3107:       assert(false, "dll_load: Could not stat() file %s, but dlopen() 
> worked; Have to improve stat()", filename);
> 3108:     }

Since this is just assert code, I'd wrap all this stuff in #ifdef ASSERT. No 
need for needless dlopens otherwise.

src/hotspot/os/aix/os_aix.cpp line 3125:

> 3123:     }
> 3124:     if (i == g_handletable_used) {
> 3125:       // library not still loaded. Check if there is space left in array

s/still/yet

src/hotspot/os/aix/os_aix.cpp line 3131:

> 3129:         pthread_mutex_unlock(&g_handletable_mutex);
> 3130:         assert(false, "max_handletable reached");
> 3131:         return nullptr;

Please, for the sake of release code, hand in an error buffer and fill it with 
something that makes sense, eg. "too many libraries loaded".
The assert is still okay, I guess, since we don't expect it to fire during 
tests; if it does fire, it may indicate a problem in our handle table logic or 
a wrong assumption about handle équality.

src/hotspot/os/aix/os_aix.cpp line 3133:

> 3131:         return nullptr;
> 3132:       }
> 3133:       // library not still loaded and still place in array, so load 
> library

s/still/yet

src/hotspot/os/aix/os_aix.cpp line 3143:

> 3141:         g_handletable[i].devid = libstat.st_dev;
> 3142:         g_handletable[i].refcount = 1;
> 3143:       }

Error handling: on error, call dlerror and return error string inside the error 
buffer you should hand in. All other platforms do this too.

src/hotspot/os/aix/os_aix.cpp line 3150:

> 3148: }
> 3149: 
> 3150: int os::Aix::dlclose(void* lib) {

can we call lib something better, maybe "handle"?

src/hotspot/os/aix/os_aix.cpp line 3165:

> 3163:       // refcount == 0, so we have to ::dlclose() the lib
> 3164:       // and delete the entry from the array.
> 3165:       res = ::dlclose(lib);

Handle dlclose error. We expect it to work; if it doesn't, it indicates that 
something is wrong with the handle logic, e.g. an invalid or already closed 
handle had been handed in. So, assert.

src/hotspot/os/aix/os_aix.hpp line 185:

> 183:   // opened a second time, as it is implemented on other platforms.
> 184:   static void* dlopen(const char* filename, int Flags);
> 185:   static int dlclose(void* lib);

Remove; should not be exported.

src/hotspot/os/posix/os_posix.cpp line 735:

> 733:     l_path = "<not available>";
> 734:   }
> 735:   int res = AIX_ONLY(os::Aix)::dlclose(lib);

Lets do this cleaner, and in the style of hotspot coding elsewhere:

- introduce a new function "os::pd_dll_unload(handle, errorbuf, errbuflen)". 
Add it to os.hpp, but somewhere non-public. The implementations will live in 
os_aix.cpp, os_bsd.cpp and os_linux.cpp.
- make os::Aix::dlclose -> os::pd_dll_unload; the only difference is that you 
should fill in error buffer with either ::dlerror or, if you have errors in 
handle table, a text describing that error
- on all other posix platforms (os_bsd.cpp + os_linux.cpp), implement a minimal 
version of os::pd_dll_unload() that calls ::dlunload, and on error calls 
::dlerror and copies the string into errbuf
- Here, call os::pd_dll_unload instead of ::dlclose/os::aix::dlclose
- change the JFR code below to not use ::dlerror but the string returned from 
the buffer

-------------

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1762354122
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415568694
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415577023
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415588986
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415577568
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415585089
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415594396
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415625152
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415595301
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415596399
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415597594
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415599081
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415601350
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415607920
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415612828
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415619700
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415625511
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415638462

Reply via email to