Added back jdk10-dev as a bcc.

Added hotspot-dev and core-libs-dev (for launcher) for follow up discussions.

Hi John,

On 8/05/2017 10:33 AM, John Platts wrote:
I actually did a search through the code that implements
JNI_CreateJavaVM, and I found that the conversion of the strings is done
using java_lang_String::create_from_platform_dependent_str, which
converts from the platform-default encoding to Unicode. In the case of
Windows-based platforms, the conversion is done based on the ANSI
character encoding instead of UTF-8 or Modified UTF-8.


The platform encoding detection logic on Windows is implemented
java_props_md.c, which can be found at
jdk/src/windows/native/java/lang/java_props_md.c in releases prior to
JDK 9 and at src/java.base/windows/native/libjava/java_props_md.c in JDK
9 and later. The encoding used for command-line arguments passed into
the JNI invocation API is Cp1252 for English locales on Windows
platforms, and not Modified UTF-8 or UTF-8.


The documentation found
at 
http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html 
also
states that the strings passed into JNI_CreateJavaVM are in the
platform-default encoding.

Thanks for the additional details. I assume you are referring to:

typedef struct JavaVMOption {
char *optionString; /* the option as a string in the default platform encoding */

that comment should not form part of the specification as it is non-normative text. If the intent is truly to use the platform default encoding and not UTF-8 then that should be very clearly spelt out in the spec!

That said, the implementation is following this so it is a limitation. I suspect this is historical.

A version of JNI_CreateJavaVM that takes UTF-16-encoded strings should
be added to the JNI Invocation API. The java.exe launchers and javaw.exe
launchers should also be updated to use the UTF-16 version of the
JNI_CreateJavaVM function on Windows platforms and to use wmain and
wWinMain instead of main and WinMain.

Why versions for UTF-16 instead of the missing UTF-8 variants? As I said the whole spec is intended to be based around UTF-8 so we would not want to throw in just a couple of UTF-16 based usages.

Thanks,
David


A few files in HotSpot would need to be changed in order to implement
the UTF-16 version of JNI_CreateJavaVM, but the change would improve
consistency across different locales on Windows platforms and allow
arguments that contain Unicode characters that are not available in the
platform-default encoding to be passed into the JVM on the command line.


The UTF-16-based version of JNI_CreateJavaVM also makes it easier to
allocate string objects that contain non-ASCII characters as the strings
are already in UTF-16 format, at least in cases where the strings
contain Unicode characters that are not in Latin-1 or on VMs that do not
support compact Latin-1 strings.


The UTF-16-based version of JNI_CreateJavaVM should probably be
implemented as a separate function so that the solution could be
backported to JDK 8 and JDK 9 updates and so that backwards
compatibility with the current JNI_CreateJavaVM implementation is
maintained.


Here is what the new UTF-16-based API might look like:

typedef struct JavaVMInitArgs_UTF16 {
    jint version;
    jint nOptions;
    JavaVMOptionUTF16 *options;
    jboolean ignoreUnrecognized;
} JavaVMInitArgs;


typedef struct JavaVMOption_UTF16 {
    char *optionString;  /* the option as a string in the default
platform encoding */
    void *extraInfo;
} JavaVMOptionUTF16;

/* vm_args is an pointer to a JavaVMInitArgs_UTF16 structure */

jint JNI_CreateJavaVM_UTF16(JavaVM **p_vm, void **p_env, void *vm_args);


/* vm_args is a pointer to a JavaVMInitArgs_UTF16 structure */

jint JNI_GetDefaultJavaVMInitArgs_UTF16(void *vm_args);

------------------------------------------------------------------------
*From:* David Holmes <david.hol...@oracle.com>
*Sent:* Thursday, May 4, 2017 11:07 PM
*To:* John Platts; jdk10-...@openjdk.java.net
*Subject:* Re: Add support for Unicode versions of JNI_CreateJavaVM and
JNI_GetDefaultJavaVMInitArgs on Windows platforms

Hi John,

The JNI is defined to use Modified UTF-8 format for strings, so any
Unicode character should be handled if passed in in the right format.
Updating the JNI specification and implementation to accept UTF-16
directly would be a major undertaking.

Is the issue here that you want a tool, like the java launcher, to
accept arbitrary Unicode strings in a end-user friendly manner and then
have it perform the modified UTF-8 conversion when invoking the VM?

Can you give a concrete example of what you would like to be able to
pass as arguments to the JVM?

Thanks,
David

On 5/05/2017 1:04 PM, John Platts wrote:
The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI 
invocation API expect ANSI strings on Windows platforms instead of 
Unicode-encoded strings. This is an issue on Windows-based platforms since some 
of the option strings that are passed into JNI_CreateJavaVM might contain 
Unicode characters that are not in
the ANSI encoding on Windows platforms.


There is support for UTF-16 literals on Windows platforms with wchar_t and wide 
character literals prefixed with the L prefix, and on platforms that support 
C11 and C++11 with char16_t and UTF-16 character literals that are prefixed 
with the u prefix.


jchar is currently defined to be a typedef for unsigned short on all platforms, 
but char16_t is a separate type and not a typedef for unsigned short or jchar 
in C++11 and later. jchar should be changed to be a typedef for wchar_t on 
Windows platforms and to be a typedef for char16_t on non-Windows platforms 
that support the
char16_t type. This change will make it possible to define jchar
character and string literals on Windows platforms and on non-Windows
platforms that support the C11 or C++11 standard.


The JCHAR_LITERAL macro should be added to the JNI header and defined as 
follows on Windows:

#define JCHAR_LITERAL(x) L ## x


The JCHAR_LITERAL macro should be added to the JNI header and defined as 
follows on non-Windows platforms:

#define JCHAR_LITERAL(x) u ## x


Here is how the Unicode version of JNI_CreateJavaVM and 
JNI_GetDefaultJavaVMInitArgs could be defined:

typedef struct JavaVMUnicodeOption {
    const jchar *optionString;  /* the option as a string in UTF-16 encoding */
    void *extraInfo;
} JavaVMUnicodeOption;

typedef struct JavaVMUnicodeInitArgs {
    jint version;
    jint nOptions;
    JavaVMUnicodeOption *options;
    jboolean ignoreUnrecognized;
} JavaVMUnicodeInitArgs;

jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args);
jint JNI_GetDefaultJavaVMInitArgs(void *args);

The java.exe wrapper should use wmain instead of main on Windows platforms, and 
the javaw.exe wrapper should use wWinMain instead of WinMain on Windows 
platforms. This change, along with the support for Unicode-enabled version of 
the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would allow the 
JVM to be
launched with arguments that contain Unicode characters that are not in
the platform-default encoding.

All of the Windows platforms that Java SE 10 and later VMs would be supported 
on do support Unicode. Adding support for Unicode versions of JNI_CreateJavaVM 
and JNI_GetDefaultJavaVMInitArgs will allow Unicode characters that are not in 
the platform-default encoding on Windows platforms to be supported in 
command-line arguments
that are passed to the JVM.

Reply via email to