Reasons to add UTF-16 versions of the JNI_CreateJavaVM and 
JNI_GetDefaultJavaVMInitArgs APIs include the following:

* The arguments passed into the wmain and wWinMain functions use UTF-16-encoded 
strings instead of UTF-8 strings.
* The arguments passed into the main and WinMain functions on Windows-platforms 
are in the ANSI character encoding instead of the UTF-8 character encoding.
* The arguments passed into the wmain and wWinMain functions would need to be 
converted to UTF-8 or modified UTF-8 encoding unless a UTF-16 version of 
JNI_CreateJavaVM is added.
* The NewString and GetStringChars APIs in the JNI already use UTF-16-encoded 
strings.
* Unicode APIs on Windows normally use UTF-16-encoded strings.
* The C11 and C++11 standards support UTF-16 strings through the char16_t type 
and support for UTF-16 literals with a u prefix.
* Windows platforms have long supported UTF-16 strings in C and C++ through the 
wchar_t type and support for UTF-16 literals with a L prefix.
* A UTF-16 version of JNI_CreateJavaVM would allow command line arguments to be 
passed into the JVM without having to perform the platform-dependent encoding 
to UTF-16 conversion that currently has to be done in the JVM.
* A UTF-16 version of JNI_CreateJavaVM would improve consistency across 
different locales on Windows-based platforms since the command-line arguments 
can be passed into the JVM in a locale-independent manner on Windows-based 
platforms.
________________________________
From: David Holmes <[email protected]>
Sent: Sunday, May 7, 2017 7:47 PM
To: John Platts
Cc: hotspot-dev developers; core-libs-dev Libs
Subject: Re: Add support for Unicode versions of JNI_CreateJavaVM and 
JNI_GetDefaultJavaVMInitArgs on Windows platforms

Added back jdk10-dev as a bcc.

Added hotspot-dev and core-libs-dev (for launcher) for follow up
discussions.

Hi John,

On 8/05/2017 10:33 AM, John Platts wrote:
> I actually did a search through the code that implements
> JNI_CreateJavaVM, and I found that the conversion of the strings is done
> using java_lang_String::create_from_platform_dependent_str, which
> converts from the platform-default encoding to Unicode. In the case of
> Windows-based platforms, the conversion is done based on the ANSI
> character encoding instead of UTF-8 or Modified UTF-8.
>
>
> The platform encoding detection logic on Windows is implemented
> java_props_md.c, which can be found at
> jdk/src/windows/native/java/lang/java_props_md.c in releases prior to
> JDK 9 and at src/java.base/windows/native/libjava/java_props_md.c in JDK
> 9 and later. The encoding used for command-line arguments passed into
> the JNI invocation API is Cp1252 for English locales on Windows
> platforms, and not Modified UTF-8 or UTF-8.
>
>
> The documentation found
> at 
> http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html
>  also
The Invocation API - 
Oracle<http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html>
docs.oracle.com
The Invocation API allows software vendors to load the Java VM into an 
arbitrary native application. Vendors can deliver Java-enabled applications 
without having to ...



> states that the strings passed into JNI_CreateJavaVM are in the
> platform-default encoding.

Thanks for the additional details. I assume you are referring to:

typedef struct JavaVMOption {
     char *optionString;  /* the option as a string in the default
platform encoding */

that comment should not form part of the specification as it is
non-normative text. If the intent is truly to use the platform default
encoding and not UTF-8 then that should be very clearly spelt out in the
spec!

That said, the implementation is following this so it is a limitation. I
suspect this is historical.

> A version of JNI_CreateJavaVM that takes UTF-16-encoded strings should
> be added to the JNI Invocation API. The java.exe launchers and javaw.exe
> launchers should also be updated to use the UTF-16 version of the
> JNI_CreateJavaVM function on Windows platforms and to use wmain and
> wWinMain instead of main and WinMain.

Why versions for UTF-16 instead of the missing UTF-8 variants? As I said
the whole spec is intended to be based around UTF-8 so we would not want
to throw in just a couple of UTF-16 based usages.

Thanks,
David

>
> A few files in HotSpot would need to be changed in order to implement
> the UTF-16 version of JNI_CreateJavaVM, but the change would improve
> consistency across different locales on Windows platforms and allow
> arguments that contain Unicode characters that are not available in the
> platform-default encoding to be passed into the JVM on the command line.
>
>
> The UTF-16-based version of JNI_CreateJavaVM also makes it easier to
> allocate string objects that contain non-ASCII characters as the strings
> are already in UTF-16 format, at least in cases where the strings
> contain Unicode characters that are not in Latin-1 or on VMs that do not
> support compact Latin-1 strings.
>
>
> The UTF-16-based version of JNI_CreateJavaVM should probably be
> implemented as a separate function so that the solution could be
> backported to JDK 8 and JDK 9 updates and so that backwards
> compatibility with the current JNI_CreateJavaVM implementation is
> maintained.
>
>
> Here is what the new UTF-16-based API might look like:
>
> typedef struct JavaVMInitArgs_UTF16 {
>     jint version;
>     jint nOptions;
>     JavaVMOptionUTF16 *options;
>     jboolean ignoreUnrecognized;
> } JavaVMInitArgs;
>
>
> typedef struct JavaVMOption_UTF16 {
>     char *optionString;  /* the option as a string in the default
> platform encoding */
>     void *extraInfo;
> } JavaVMOptionUTF16;
>
> /* vm_args is an pointer to a JavaVMInitArgs_UTF16 structure */
>
> jint JNI_CreateJavaVM_UTF16(JavaVM **p_vm, void **p_env, void *vm_args);
>
>
> /* vm_args is a pointer to a JavaVMInitArgs_UTF16 structure */
>
> jint JNI_GetDefaultJavaVMInitArgs_UTF16(void *vm_args);
>
> ------------------------------------------------------------------------
> *From:* David Holmes <[email protected]>
> *Sent:* Thursday, May 4, 2017 11:07 PM
> *To:* John Platts; [email protected]
> *Subject:* Re: Add support for Unicode versions of JNI_CreateJavaVM and
> JNI_GetDefaultJavaVMInitArgs on Windows platforms
>
> Hi John,
>
> The JNI is defined to use Modified UTF-8 format for strings, so any
> Unicode character should be handled if passed in in the right format.
> Updating the JNI specification and implementation to accept UTF-16
> directly would be a major undertaking.
>
> Is the issue here that you want a tool, like the java launcher, to
> accept arbitrary Unicode strings in a end-user friendly manner and then
> have it perform the modified UTF-8 conversion when invoking the VM?
>
> Can you give a concrete example of what you would like to be able to
> pass as arguments to the JVM?
>
> Thanks,
> David
>
> On 5/05/2017 1:04 PM, John Platts wrote:
>> The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI 
>> invocation API expect ANSI strings on Windows platforms instead of 
>> Unicode-encoded strings. This is an issue on Windows-based platforms since 
>> some of the option strings that are passed into JNI_CreateJavaVM might 
>> contain Unicode characters that are not in
> the ANSI encoding on Windows platforms.
>>
>>
>> There is support for UTF-16 literals on Windows platforms with wchar_t and 
>> wide character literals prefixed with the L prefix, and on platforms that 
>> support C11 and C++11 with char16_t and UTF-16 character literals that are 
>> prefixed with the u prefix.
>>
>>
>> jchar is currently defined to be a typedef for unsigned short on all 
>> platforms, but char16_t is a separate type and not a typedef for unsigned 
>> short or jchar in C++11 and later. jchar should be changed to be a typedef 
>> for wchar_t on Windows platforms and to be a typedef for char16_t on 
>> non-Windows platforms that support the
> char16_t type. This change will make it possible to define jchar
> character and string literals on Windows platforms and on non-Windows
> platforms that support the C11 or C++11 standard.
>>
>>
>> The JCHAR_LITERAL macro should be added to the JNI header and defined as 
>> follows on Windows:
>>
>> #define JCHAR_LITERAL(x) L ## x
>>
>>
>> The JCHAR_LITERAL macro should be added to the JNI header and defined as 
>> follows on non-Windows platforms:
>>
>> #define JCHAR_LITERAL(x) u ## x
>>
>>
>> Here is how the Unicode version of JNI_CreateJavaVM and 
>> JNI_GetDefaultJavaVMInitArgs could be defined:
>>
>> typedef struct JavaVMUnicodeOption {
>>     const jchar *optionString;  /* the option as a string in UTF-16 encoding 
>> */
>>     void *extraInfo;
>> } JavaVMUnicodeOption;
>>
>> typedef struct JavaVMUnicodeInitArgs {
>>     jint version;
>>     jint nOptions;
>>     JavaVMUnicodeOption *options;
>>     jboolean ignoreUnrecognized;
>> } JavaVMUnicodeInitArgs;
>>
>> jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args);
>> jint JNI_GetDefaultJavaVMInitArgs(void *args);
>>
>> The java.exe wrapper should use wmain instead of main on Windows platforms, 
>> and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows 
>> platforms. This change, along with the support for Unicode-enabled version 
>> of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would 
>> allow the JVM to be
> launched with arguments that contain Unicode characters that are not in
> the platform-default encoding.
>>
>> All of the Windows platforms that Java SE 10 and later VMs would be 
>> supported on do support Unicode. Adding support for Unicode versions of 
>> JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs will allow Unicode 
>> characters that are not in the platform-default encoding on Windows 
>> platforms to be supported in command-line arguments
> that are passed to the JVM.
>>

Reply via email to