Reasons to add UTF-16 versions of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs APIs include the following:
* The arguments passed into the wmain and wWinMain functions use UTF-16-encoded strings instead of UTF-8 strings. * The arguments passed into the main and WinMain functions on Windows-platforms are in the ANSI character encoding instead of the UTF-8 character encoding. * The arguments passed into the wmain and wWinMain functions would need to be converted to UTF-8 or modified UTF-8 encoding unless a UTF-16 version of JNI_CreateJavaVM is added. * The NewString and GetStringChars APIs in the JNI already use UTF-16-encoded strings. * Unicode APIs on Windows normally use UTF-16-encoded strings. * The C11 and C++11 standards support UTF-16 strings through the char16_t type and support for UTF-16 literals with a u prefix. * Windows platforms have long supported UTF-16 strings in C and C++ through the wchar_t type and support for UTF-16 literals with a L prefix. * A UTF-16 version of JNI_CreateJavaVM would allow command line arguments to be passed into the JVM without having to perform the platform-dependent encoding to UTF-16 conversion that currently has to be done in the JVM. * A UTF-16 version of JNI_CreateJavaVM would improve consistency across different locales on Windows-based platforms since the command-line arguments can be passed into the JVM in a locale-independent manner on Windows-based platforms. ________________________________ From: David Holmes <[email protected]> Sent: Sunday, May 7, 2017 7:47 PM To: John Platts Cc: hotspot-dev developers; core-libs-dev Libs Subject: Re: Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms Added back jdk10-dev as a bcc. Added hotspot-dev and core-libs-dev (for launcher) for follow up discussions. Hi John, On 8/05/2017 10:33 AM, John Platts wrote: > I actually did a search through the code that implements > JNI_CreateJavaVM, and I found that the conversion of the strings is done > using java_lang_String::create_from_platform_dependent_str, which > converts from the platform-default encoding to Unicode. In the case of > Windows-based platforms, the conversion is done based on the ANSI > character encoding instead of UTF-8 or Modified UTF-8. > > > The platform encoding detection logic on Windows is implemented > java_props_md.c, which can be found at > jdk/src/windows/native/java/lang/java_props_md.c in releases prior to > JDK 9 and at src/java.base/windows/native/libjava/java_props_md.c in JDK > 9 and later. The encoding used for command-line arguments passed into > the JNI invocation API is Cp1252 for English locales on Windows > platforms, and not Modified UTF-8 or UTF-8. > > > The documentation found > at > http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html > also The Invocation API - Oracle<http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html> docs.oracle.com The Invocation API allows software vendors to load the Java VM into an arbitrary native application. Vendors can deliver Java-enabled applications without having to ... > states that the strings passed into JNI_CreateJavaVM are in the > platform-default encoding. Thanks for the additional details. I assume you are referring to: typedef struct JavaVMOption { char *optionString; /* the option as a string in the default platform encoding */ that comment should not form part of the specification as it is non-normative text. If the intent is truly to use the platform default encoding and not UTF-8 then that should be very clearly spelt out in the spec! That said, the implementation is following this so it is a limitation. I suspect this is historical. > A version of JNI_CreateJavaVM that takes UTF-16-encoded strings should > be added to the JNI Invocation API. The java.exe launchers and javaw.exe > launchers should also be updated to use the UTF-16 version of the > JNI_CreateJavaVM function on Windows platforms and to use wmain and > wWinMain instead of main and WinMain. Why versions for UTF-16 instead of the missing UTF-8 variants? As I said the whole spec is intended to be based around UTF-8 so we would not want to throw in just a couple of UTF-16 based usages. Thanks, David > > A few files in HotSpot would need to be changed in order to implement > the UTF-16 version of JNI_CreateJavaVM, but the change would improve > consistency across different locales on Windows platforms and allow > arguments that contain Unicode characters that are not available in the > platform-default encoding to be passed into the JVM on the command line. > > > The UTF-16-based version of JNI_CreateJavaVM also makes it easier to > allocate string objects that contain non-ASCII characters as the strings > are already in UTF-16 format, at least in cases where the strings > contain Unicode characters that are not in Latin-1 or on VMs that do not > support compact Latin-1 strings. > > > The UTF-16-based version of JNI_CreateJavaVM should probably be > implemented as a separate function so that the solution could be > backported to JDK 8 and JDK 9 updates and so that backwards > compatibility with the current JNI_CreateJavaVM implementation is > maintained. > > > Here is what the new UTF-16-based API might look like: > > typedef struct JavaVMInitArgs_UTF16 { > jint version; > jint nOptions; > JavaVMOptionUTF16 *options; > jboolean ignoreUnrecognized; > } JavaVMInitArgs; > > > typedef struct JavaVMOption_UTF16 { > char *optionString; /* the option as a string in the default > platform encoding */ > void *extraInfo; > } JavaVMOptionUTF16; > > /* vm_args is an pointer to a JavaVMInitArgs_UTF16 structure */ > > jint JNI_CreateJavaVM_UTF16(JavaVM **p_vm, void **p_env, void *vm_args); > > > /* vm_args is a pointer to a JavaVMInitArgs_UTF16 structure */ > > jint JNI_GetDefaultJavaVMInitArgs_UTF16(void *vm_args); > > ------------------------------------------------------------------------ > *From:* David Holmes <[email protected]> > *Sent:* Thursday, May 4, 2017 11:07 PM > *To:* John Platts; [email protected] > *Subject:* Re: Add support for Unicode versions of JNI_CreateJavaVM and > JNI_GetDefaultJavaVMInitArgs on Windows platforms > > Hi John, > > The JNI is defined to use Modified UTF-8 format for strings, so any > Unicode character should be handled if passed in in the right format. > Updating the JNI specification and implementation to accept UTF-16 > directly would be a major undertaking. > > Is the issue here that you want a tool, like the java launcher, to > accept arbitrary Unicode strings in a end-user friendly manner and then > have it perform the modified UTF-8 conversion when invoking the VM? > > Can you give a concrete example of what you would like to be able to > pass as arguments to the JVM? > > Thanks, > David > > On 5/05/2017 1:04 PM, John Platts wrote: >> The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI >> invocation API expect ANSI strings on Windows platforms instead of >> Unicode-encoded strings. This is an issue on Windows-based platforms since >> some of the option strings that are passed into JNI_CreateJavaVM might >> contain Unicode characters that are not in > the ANSI encoding on Windows platforms. >> >> >> There is support for UTF-16 literals on Windows platforms with wchar_t and >> wide character literals prefixed with the L prefix, and on platforms that >> support C11 and C++11 with char16_t and UTF-16 character literals that are >> prefixed with the u prefix. >> >> >> jchar is currently defined to be a typedef for unsigned short on all >> platforms, but char16_t is a separate type and not a typedef for unsigned >> short or jchar in C++11 and later. jchar should be changed to be a typedef >> for wchar_t on Windows platforms and to be a typedef for char16_t on >> non-Windows platforms that support the > char16_t type. This change will make it possible to define jchar > character and string literals on Windows platforms and on non-Windows > platforms that support the C11 or C++11 standard. >> >> >> The JCHAR_LITERAL macro should be added to the JNI header and defined as >> follows on Windows: >> >> #define JCHAR_LITERAL(x) L ## x >> >> >> The JCHAR_LITERAL macro should be added to the JNI header and defined as >> follows on non-Windows platforms: >> >> #define JCHAR_LITERAL(x) u ## x >> >> >> Here is how the Unicode version of JNI_CreateJavaVM and >> JNI_GetDefaultJavaVMInitArgs could be defined: >> >> typedef struct JavaVMUnicodeOption { >> const jchar *optionString; /* the option as a string in UTF-16 encoding >> */ >> void *extraInfo; >> } JavaVMUnicodeOption; >> >> typedef struct JavaVMUnicodeInitArgs { >> jint version; >> jint nOptions; >> JavaVMUnicodeOption *options; >> jboolean ignoreUnrecognized; >> } JavaVMUnicodeInitArgs; >> >> jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args); >> jint JNI_GetDefaultJavaVMInitArgs(void *args); >> >> The java.exe wrapper should use wmain instead of main on Windows platforms, >> and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows >> platforms. This change, along with the support for Unicode-enabled version >> of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would >> allow the JVM to be > launched with arguments that contain Unicode characters that are not in > the platform-default encoding. >> >> All of the Windows platforms that Java SE 10 and later VMs would be >> supported on do support Unicode. Adding support for Unicode versions of >> JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs will allow Unicode >> characters that are not in the platform-default encoding on Windows >> platforms to be supported in command-line arguments > that are passed to the JVM. >>
