Hi Sherman,

1) If you can point out the regression test cases that are compromised by the 
fix - it would be very helpful;
2) From my understanding you can change default encoding by starting java with 
-Dsun.jnu.encoding=UTF-8 - this is well known feature that never caused 
problems (javac doesn't have such a switch );
3) If you state that java is non-Unicode on Windows by nature - the issue  
JDK-8124977 is a feature not a bug  :)

Thanks,
Vladimir.

-----Original Message-----
From: Xueming Shen [mailto:xueming.s...@oracle.com] 
Sent: Tuesday, February 23, 2016 8:54 PM
To: Vladimir Shcherbakov <vlas...@microsoft.com>
Cc: Naoto Sato <naoto.s...@oracle.com>; Kumar Srinivasan 
<kumar.x.sriniva...@oracle.com>; Martin Sawicki <marc...@microsoft.com>; 
core-libs-dev Libs <core-libs-dev@openjdk.java.net>
Subject: Re: RFR 8124977 cmdline encoding challenges on Windows

Vladimir,

sun.jnu.encoding is used by
JNU_NewStringPlatform/JNU_GetStringPlatformChars. The JNU_ pair is "widely" 
used by the various native library code to convert between the jstring and 
native char*, with the assumption that the "platform encoding" for the native 
char* is the "default" encoding used by the underlying platform/os APIs that 
takes char* parameters or return char* values, in case of Windows, it's the 
code page decided by the system locale. We have migrated certain areas 
completely to use the "W" version/WChar APIs, such as the 
https://na01.safelinks.protection.outlook.com/?url=java.io&data=01%7c01%7cvlashch%40microsoft.com%7c635061d867af4ad4105008d33cd679e7%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=l4G1yzKKhniPRYJvBsGxchsBZvuWliVq8tILa0pLoY8%3d,
 the system properties initialization, but I'm think lots of areas still work 
on the "A" apis, especially I think the "char*" interface between the jvm and 
the libraries is still the the "ansi" codepage, not the utf8. Those work on 
utf8 have their names explicitly named as "xyzUTF" or similar.

For example, the "java_home_dir" path used in 
libjava/TimeZone.c/getSystemTimeZoneID/
TimeZone_md.c/findJavaTZ_md is encoded frm jstring java_home to char* via 
JNU_GetSTringPlatformChars.
Simply change/hardcode the jnu_sun.encoding to utf8 probably will cause the 
timezone code stop to work if the java_home_dir path has some non-ascii 
characters in it (the jdk/jre is installed in a Japanese/Chinese directory, for 
example).

A quick "grep" indicates java.desktop/windows/native/libawt/windows
package has a heavily
usage of the JNU_ pair as well. I'm not sure if this awt implementation is 
still being used though :-)

Before we clear all these internal "StringPlatform" use cases (I'm not sure if 
they are also used by external), I don't think we can simply set the 
sun.jnu.encoding to utf8, though it's very attractive.

Thanks,
-Sherman

On 2/23/16 4:34 PM, Naoto Sato wrote:
> Hi Vladimir,
>
> I think it would work fine with the Java launcher, but what about 
> other areas, which may rely on the native encodings? Java runtime is 
> in itself a "non-Unicode" application, so still there may be the area 
> affected by hardcoding "UTF-8" as the native encoding. Have you 
> checked in such cases? Sherman, will you comment on this too?
>
> Naoto
>
> On 2/23/16 2:12 PM, Vladimir Shcherbakov wrote:
>> Hi Naoto,
>>
>> 1) The system locale determines which code page is used on the system 
>> by default on operating systems that use Unicode as their native 
>> encoding (all OSes  from Windows 2000 to Windows 10) to convert text 
>> data from Unicode to code page whenever dealing with legacy 
>> non-Unicode applications. Only applications that do not use Unicode 
>> as their default character-encoding mechanism are affected by this 
>> setting; therefore, applications that are already Unicode-encoded can 
>> safely ignore the value and functionality of this setting.
>>
>> 2) The fundamental representation of text in Windows NT-based 
>> operating systems is UTF-16, and the WCHAR data type is a UTF-16 code 
>> unit. Java launcher, from the other side, uses CHAR as a code unit - 
>> so to use UNICODE charset with Java launcher we had to encode entire 
>> command line with UTF-8 (convert from UTF-16 to UTF-8). After that 
>> step we can state that Java launcher is Unicode-encoded and can 
>> safely ignore the value and functionality of the system locale. To 
>> let JVM know that we use UTF-8 as a default UNICODE encoding for 
>> platform string  - we assign the value to sprops.sun_jnu_encoding 
>> property (mac osx does the same) instead of reading system locale 
>> code page.
>>
>> The main idea of the fix was to change the way of how java and javac 
>> works with so called platform string on Windows. Before the fix the 
>> platform string was read as ANSI encoded - that's why the system 
>> locale code page was very important. The sun.jnu.encoding property is 
>> responsible for storing the platform string encoding. On Windows the 
>> property could be set with the system locale but the system locale 
>> doesn't support (by design) UTF-8 or with -Dsun.jnu.encoding switch, 
>> but the switch only works with java not with javac, and the switch 
>> was useless for ANSI encoded platform string.
>>
>> Thanks,
>> Vladimir.
>>
>> -----Original Message-----
>> From: Naoto Sato [mailto:naoto.s...@oracle.com]
>> Sent: Tuesday, February 23, 2016 10:47 AM
>> To: Kumar Srinivasan <kumar.x.sriniva...@oracle.com>; Vladimir 
>> Shcherbakov <vlas...@microsoft.com>; SHEN,XUEMING 
>> <xueming.s...@oracle.com>
>> Cc: Martin Sawicki <marc...@microsoft.com>; core-libs-dev Libs 
>> <core-libs-dev@openjdk.java.net>
>> Subject: Re: RFR 8124977 cmdline encoding challenges on Windows
>>
>> Hello,
>>
>> Sorry if this has already been discussed, but this is my first time 
>> looking at the fix. In java_props_md.c, sprops.sun_jnu_encoding is 
>> now always "UTF-8". Is it always the case? What if the system admin 
>> switches the locale for "non-Unicode" applications in the Windows 
>> control panel?
>>
>> Naoto
>>
>> On 2/22/16 8:00 AM, Kumar Srinivasan wrote:
>>>
>>> Hi Naoto, Sherman,  can you please take a look.
>>> I tested with the jprt build and test all tests pass.
>>>
>>> Hi Vladimir, et. al.,
>>>
>>> It appears that there has been more simplifications from the 
>>> previous webrev.04. :-)
>>>
>>> It would've helped if you highlight the changes you have made from 
>>> the previous revision, unfortunately this is one of the deficiencies 
>>> of webrev.
>>>
>>> There are some inconsistencies in the coding conventions:
>>>
>>> parse_manifest.c
>>> + if (q == 0) return -1;
>>>
>>> we expect the return to be on the next line.
>>>
>>> similarly main.c
>>>
>>> if (0 == q)
>>> {
>>>
>>> I can fix those up. If I were to push this change, who should I 
>>> attribute the changes to ? ie. in the Contributed-by: line of the 
>>> commit info ?
>>> Please note these have to be email addresses of the contributors.
>>>
>>> Thanks
>>> Kumar
>>>
>>>> Hi Kumar,
>>>>
>>>> We posted another web review here:
>>>> https://na01.safelinks.protection.outlook.com/?url=http:%2f%2fcr.op
>>>> en 
>>>> jdk.java.net%2f~kshoop%2f8124977%2fwebrev.05%2f&data=01%7C01%7Cvlas
>>>> hc
>>>> h%40microsoft.com%7Cf33316507f214e013a4008d33c81c785%7C72f988bf86f1
>>>> 41 
>>>> af91ab2d7cd011db47%7C1&sdata=%2fTQaWH0KGurgvZcdCQRZHSyaftjlMsW5FVc%
>>>> 2f
>>>> 14Wc5fA%3d
>>>>
>>>> The patch was successfully tested.
>>>>
>>>> Test details:
>>>> * Regression tests folder: jdk/test/tools/launcher/
>>>> * Builds were used: windows-x86_64-normal-server-fastdebug,
>>>> windows-x86_64-normal-server-release,
>>>> windows-x86-normal-server-release;
>>>> * Platforms were used:  Windows 7(64 bit), Windows 8.1, Windows 
>>>> Server
>>>> 2012 R2 DC, Windows 10 ;
>>>> * System locales were used: English (United States), Persian, 
>>>> Japanese (Japan), Chinese (Traditional, Taiwan), Russian (Russia);
>>>>
>>>> Thanks,
>>>> Vladimir.
>>>>
>>>> -----Original Message-----
>>>> From: Martin Sawicki
>>>> Sent: Thursday, January 14, 2016 11:34 AM
>>>> To: Kumar Srinivasan <kumar.x.sriniva...@oracle.com>; Vladimir 
>>>> Shcherbakov <vlas...@microsoft.com>
>>>> Cc: core-libs-dev Libs <core-libs-dev@openjdk.java.net>; Naoto Sato 
>>>> <naoto.s...@oracle.com>
>>>> Subject: RE: RFR 8124977 cmdline encoding challenges on Windows
>>>>
>>>> Thanks for the feedback.
>>>> Investigating the regression failure.
>>>> We'll get back as soon as we figure this out.  (and yes, we'll run 
>>>> this through some localized Windows VMs)
>>>>
>>>> Cheers
>>>>
>>>> -----Original Message-----
>>>> From: Kumar Srinivasan [mailto:kumar.x.sriniva...@oracle.com]
>>>> Sent: Tuesday, January 12, 2016 2:35 PM
>>>> To: Martin Sawicki <marc...@microsoft.com>; Vladimir Shcherbakov 
>>>> <vlas...@microsoft.com>
>>>> Cc: core-libs-dev Libs <core-libs-dev@openjdk.java.net>; Naoto Sato 
>>>> <naoto.s...@oracle.com>
>>>> Subject: Re: RFR 8124977 cmdline encoding challenges on Windows
>>>>
>>>> Hi Martin, Vladimir,
>>>>
>>>> It was suggested that this patch be tested on localized Windows 
>>>> machines and/or trying with the various Windows native encodings, 
>>>> appreciate if you can verify this as well.
>>>>
>>>> Thanks
>>>> Kumar
>>>>
>>>> On 1/11/2016 1:10 PM, Kumar Srinivasan wrote:
>>>>> Hi,
>>>>>
>>>>> Was on vacation, I started to prepare the patch from webrev.04 for 
>>>>> integration. Please note: made some adjustments to your patch to 
>>>>> pass jcheck, ie. usage of tabs and space at line endings, and 
>>>>> modifications to Copyright dates.
>>>>>
>>>>> Also fixed a minor bug on unix replaced JLI_TRUE with JNI_TRUE.
>>>>> I have attached a patch to for your reference.
>>>>>
>>>>> However, there is a regression test failure on Windows, 
>>>>> jdk/test/tools/launcher/I18NTest.java
>>>>>
>>>>> ---Test info----
>>>>> Executed command: C:\mmm\jdk\bin\javac.exe i18nH▒lloWorld.java
>>>>>
>>>>> ++++Test Output++++
>>>>> javac: file not found: i18nHélloWorld.java ----End test info-----
>>>>>
>>>>> Have you run all the launcher regression tests with this changeset ?
>>>>>
>>>>> Thanks
>>>>> Kumar
>>>>>
>>>>>> Hi Kumar, just wondering if there are any updates on processing 
>>>>>> this submission.
>>>>>> Thanks!
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Shcherbakov
>>>>>> Sent: Wednesday, November 25, 2015 2:38 PM
>>>>>> To: Kumar Srinivasan <kumar.x.sriniva...@oracle.com>; Martin 
>>>>>> Sawicki <marc...@microsoft.com>
>>>>>> Cc: Kirk Shoop <kirk.sh...@microsoft.com>; core-libs-dev Libs 
>>>>>> <core-libs-dev@openjdk.java.net>
>>>>>> Subject: RE: RFR 8124977 cmdline encoding challenges on Windows
>>>>>>
>>>>>> Hi Kumar,
>>>>>>
>>>>>> Please find updated webreview here:
>>>>>> https://na01.safelinks.protection.outlook.com/?url=http:%2f%2fcr.
>>>>>> op
>>>>>> en
>>>>>> jdk.java.net%2f~kshoop%2f8124977%2fwebrev.04%2f&data=01%7C01%7Cma
>>>>>> rc
>>>>>> in
>>>>>> s%40microsoft.com%7C13ff309b775c4c019fc308d31ba0c43c%7C72f988bf86
>>>>>> f1
>>>>>> 41
>>>>>> af91ab2d7cd011db47%7C1&sdata=3hhbO5mNPyTvtrTb4kCR42zsWGPGzDhqnmjp
>>>>>> Nf
>>>>>> wn
>>>>>> bIw%3d
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Kumar Srinivasan [mailto:kumar.x.sriniva...@oracle.com]
>>>>>> Sent: Sunday, November 22, 2015 8:14 AM
>>>>>> To: Martin Sawicki <marc...@microsoft.com>
>>>>>> Cc: Kirk Shoop <kirk.sh...@microsoft.com>; Vladimir Shcherbakov 
>>>>>> <vlas...@microsoft.com>; core-libs-dev Libs 
>>>>>> <core-libs-dev@openjdk.java.net>
>>>>>> Subject: Re: RFR 8124977 cmdline encoding challenges on Windows
>>>>>>
>>>>>>
>>>>>> Hi Martin, et. al.,
>>>>>>
>>>>>> Sorry for not getting back earlier, I am very busy right now with 
>>>>>> my other large commitments for JDK9.
>>>>>>
>>>>>> I will sponsor this "enhancement/bug fix" sometime in the new 
>>>>>> year, meanwhile, there is the changeset  [1] which is likely to 
>>>>>> cause merge conflicts, and perhaps logic issues.
>>>>>>
>>>>>> Thanks
>>>>>> Kumar
>>>>>>
>>>>>> [1]
>>>>>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fhg.
>>>>>> op
>>>>>> enjdk.java.net%2fjdk9%2fdev%2fjdk%2frev%2f3b201a9ef918&data=01%7c
>>>>>> 01
>>>>>> %7
>>>>>> cvlashch%40microsoft.com%7c4d49ae546dba4d29b7be08d2f3589ee1%7c72f
>>>>>> 98
>>>>>> 8b
>>>>>> f86f141af91ab2d7cd011db47%7c1&sdata=I2FKvBn82%2fxhW3D%2fi%2bRWaNO
>>>>>> Jk
>>>>>> 7M
>>>>>> g4lt2P0sdzLS%2fT9Q%3d
>>>>>>> Hi all
>>>>>>> Here's an updated webrev attempting to take into account the 
>>>>>>> various pieces of feedback we have received:
>>>>>>>
>>>>>>> Issue:
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbugs. 
>>>>>>>
>>>>>>> openjdk.java.net%2fbrowse%2fJDK-8124977&data=01%7c01%7cvlashch%4
>>>>>>> 0m
>>>>>>> ic
>>>>>>> ro
>>>>>>> soft.com%7c4d49ae546dba4d29b7be08d2f3589ee1%7c72f988bf86f141af91
>>>>>>> ab
>>>>>>> 2d
>>>>>>> 7c
>>>>>>> d011db47%7c1&sdata=FjmfM%2fnPbWB%2fMsUU8uDzAUo3aPu3zOELVsJO%2fsU
>>>>>>> Iq
>>>>>>> 9E
>>>>>>> %3
>>>>>>> d
>>>>>>> Webrev:
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=http:%2f%2fcr
>>>>>>> .o
>>>>>>> pe
>>>>>>> nj
>>>>>>> dk.java.net%2f~kshoop%2f8124977%2fwebrev.03%2f&data=01%7C01%7Cvl
>>>>>>> as
>>>>>>> hc
>>>>>>> h%
>>>>>>> 40microsoft.com%7C4d49ae546dba4d29b7be08d2f3589ee1%7C72f988bf86f
>>>>>>> 14
>>>>>>> 1a
>>>>>>> f9
>>>>>>> 1ab2d7cd011db47%7C1&sdata=101HBPar2AZ63GJWyubWH0DiKmNI%2bOxknN66
>>>>>>> 7B
>>>>>>> Jn
>>>>>>> WY
>>>>>>> 0%3d
>>>>>>>
>>>>>>> (Vladimir Shcherbakov is now working on this from our side)
>>>>>>>
>>>>>>> Looking forward to any other feedback.
>>>>>>> Thanks
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: core-libs-dev
>>>>>>> [mailto:core-libs-dev-boun...@openjdk.java.net]
>>>>>>> On Behalf Of Kumar Srinivasan
>>>>>>> Sent: Thursday, June 25, 2015 6:26 AM
>>>>>>> To: Kirk Shoop (MS OPEN TECH) <kirk.sh...@microsoft.com>
>>>>>>> Cc: Valery Kopylov (Akvelon) <v-val...@microsoft.com>; 
>>>>>>> core-libs-dev Libs <core-libs-dev@openjdk.java.net>
>>>>>>> Subject: Re: RFR 8124977 cmdline encoding challenges on Windows
>>>>>>>
>>>>>>> Hi Kirk,
>>>>>>>
>>>>>>> Thanks for proposing this change.
>>>>>>>
>>>>>>> If you notice all the posix calls are wrapped in JLI_* this 
>>>>>>> gives us the ability to use "W" functions.  I almost got it 
>>>>>>> done, several years ago, but we upgraded to VS2010 and my work 
>>>>>>> based on
>>>>>>> VS2003 keeled over, meanwhile my focus was  "shifted" to 
>>>>>>> something else.
>>>>>>>
>>>>>>> main.c: is really envisioned to be a stub  compiled by the tool 
>>>>>>> launchers, like java, javac, javah, jar etc. I prefer to see all 
>>>>>>> the heavy logic in this file moved to the platform specific file
>>>>>>> windows/java_md.*
>>>>>>>
>>>>>>> For the reason specified above we need to move fprintf or any 
>>>>>>> naked posix calls to JLI_* indirections.
>>>>>>>
>>>>>>> I don't see any tests ? The tests must be written in java and 
>>>>>>> placed in jdk/test/tools/launcher, there is a helper framework 
>>>>>>> TestHelper.java.
>>>>>>>
>>>>>>> There are other changes in nio, charsets etc, this will be 
>>>>>>> reviewed by my colleague specializing in that area (Sherman) cc'ed.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Kumar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/22/2015 2:01 PM, Kirk Shoop (MS OPEN TECH) wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Issue:
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%
>>>>>>>> 2f
>>>>>>>> bu
>>>>>>>> gs
>>>>>>>> .openjdk.java.net%2fbrowse%2fJDK-8124977&data=01%7c01%7cvlashch
>>>>>>>> %4
>>>>>>>> 0m
>>>>>>>> ic
>>>>>>>> rosoft.com%7c4d49ae546dba4d29b7be08d2f3589ee1%7c72f988bf86f141a
>>>>>>>> f9
>>>>>>>> 1a
>>>>>>>> b2
>>>>>>>> d7cd011db47%7c1&sdata=FjmfM%2fnPbWB%2fMsUU8uDzAUo3aPu3zOELVsJO%
>>>>>>>> 2f
>>>>>>>> sU
>>>>>>>> Iq
>>>>>>>> 9E%3d
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=http:%2f%2fcr.
>>>>>>>> op
>>>>>>>> en
>>>>>>>> jdk.java.net%2f~kshoop%2f8124977%2f&data=01%7C01%7Cvlashch%40mi
>>>>>>>> cr
>>>>>>>> os
>>>>>>>> of
>>>>>>>> t.com%7C4d49ae546dba4d29b7be08d2f3589ee1%7C72f988bf86f141af91ab
>>>>>>>> 2d
>>>>>>>> 7c
>>>>>>>> d0
>>>>>>>> 11db47%7C1&sdata=RAA%2b5aIzKtrk5X85oLXKlPzbpSk%2bgJZRI%2b0QSI11
>>>>>>>> B0
>>>>>>>> M%
>>>>>>>> 3d
>>>>>>>>
>>>>>>>> This webrev intends to address interaction between Windows 
>>>>>>>> console and java apps.
>>>>>>>>
>>>>>>>> Two switches were added that change the behavior of the launcher.
>>>>>>>> The defaults do not change the launcher behavior.
>>>>>>>>
>>>>>>>>        -Dwindows.UnicodeConsole=true - switches on Unicode 
>>>>>>>> support in the Windows console. This optional switch causes the 
>>>>>>>> launcher to call GetCommandLineW() and parse the arguments in 
>>>>>>>> unicode. It also modifies how the codepage for console output is 
>>>>>>>> selected.
>>>>>>>>
>>>>>>>>        -Dfile.encoding.unicode="UTF-8" - identifies Unicode 
>>>>>>>> charset to use; If not specified, UTF-8 is used by default.
>>>>>>>> Ignored when windows.UnicodeConsole is not set to true. When 
>>>>>>>> the first switch is used, this optional switch allows the 
>>>>>>>> codepage for console output to be controlled.
>>>>>>>>
>>>>>>>> I would like to get feedback on the approach here and any 
>>>>>>>> additional work that is required solve these particular Unicode 
>>>>>>>> issues on Windows.
>>>>>>>>
>>>>>>>> Kirk
>>>

Reply via email to