Re: RFR: 8195129: System.load() fails to load from unicode paths [v5]

Maxim Kartashev Mon, 07 Jun 2021 09:28:07 -0700

On Fri, 4 Jun 2021 13:36:27 GMT, Maxim Kartashev 
<[email protected]> wrote:


>> Character strings within JVM are produced and consumed in several formats. 
>> Strings come from/to Java in the UTF8 format and POSIX APIs (like fprintf() 
>> or dlopen()) consume strings also in UTF8. On Windows, however, the 
>> situation is far less simple: some new(er) APIs expect UTF16 (wide-character 
>> strings), some older APIs can only work with strings in a "platform" format, 
>> where not all UTF8 characters can be represented; which ones can depends on 
>> the current "code page".
>> 
>> This commit switches the Windows version of native library loading code to 
>> using the new UTF16 API `LoadLibraryW()` and attempts to streamline the use 
>> of various string formats in the surrounding code. 
>> 
>> Namely, exception messages are made to consume strings explicitly in the 
>> UTF8 format, while logging functions (that end up using legacy Windows API) 
>> are made to consume "platform" strings in most cases. One exception is 
>> `JVM_LoadLibrary()` logging where the UTF8 name of the library is logged, 
>> which can, of course, be fixed, but was considered not worth the additional 
>> code (NB: this isn't a new bug).
>> 
>> The test runs in a separate JVM in order to make NIO happy about non-ASCII 
>> characters in the file name; tests are executed with LC_ALL=C and that 
>> doesn't let NIO work with non-ASCII file names even on Linux or MacOS.
>> 
>> Tested by running `test/hotspot/jtreg:tier1` on Linux and 
>> `jtreg:test/hotspot/jtreg/runtime` on Windows 10. The new test (`   
>> jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode`) was explicitly ran 
>> on those platforms as well.
>> 
>> Results from Linux:
>> 
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR 
>>   
>>    jtreg:test/hotspot/jtreg:tier1                     1784  1784     0     0 
>>   
>> ==============================
>> TEST SUCCESS
>> 
>> 
>> Building target 'run-test-only' in configuration 
>> 'linux-x86_64-server-release'
>> Test selection 'jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode', 
>> will run:
>> * jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode
>> 
>> Running test 'jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode'
>> Passed: runtime/jni/loadLibraryUnicode/LoadLibraryUnicodeTest.java
>> Test results: passed: 1
>> 
>> 
>> Results from Windows 10:
>> 
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/runtime                    746   746     0     0
>> ==============================
>> TEST SUCCESS
>> Finished building target 'run-test-only' in configuration 
>> 'windows-x86_64-server-fastdebug'
>> 
>> 
>> Building target 'run-test-only' in configuration 
>> 'windows-x86_64-server-fastdebug'
>> Test selection 'test/hotspot/jtreg/runtime/jni/loadLibraryUnicode', will run:
>> * jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode
>> 
>> Running test 'jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode'
>> Passed: runtime/jni/loadLibraryUnicode/LoadLibraryUnicodeTest.java
>> Test results: passed: 1
>
> Maxim Kartashev has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Updated the test to run on Windows only and to use a character from the
>   supplementary plane in the path name.

I came to realize that changing `os::dll_load()` to accept UTF-8 (standard or 
modified) will break all the users of that function except `JVM_LoadLibrary()`. 
Consider `os::native_java_library()` that still operates with the platform 
encoding on Windows and works correctly if CWD contains Latin-1 characters 
(assuming 1252 code page). With this change, `java` will fail to start if its 
path name contains, say, Æ because `os::dll_load()` will expect it to be 
encoded as `c3 86` (UTF-8), but will get `c6` (Latin-1) instead.

One possible solution is to update all the call sites of `os::dll_load()` 
(quite laborous), another is to introduce `os::dll_load_utf8()` and change only 
`JVM_LoadLibrary()` at this point in time.

Advice is welcome.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4169

Re: RFR: 8195129: System.load() fails to load from unicode paths [v5]

Reply via email to