Re: RFR: 8337077: Java uses wrong Charset in System.out when running on MINGW

Naoto Sato Tue, 20 Aug 2024 09:34:37 -0700

Hello,

As I commented in the bug report, I closed the issue as "not an issue",as LC_* environment values on Windows has never been supported (or evenconsidered) as a means to set locale/encoding as the way POSIX does. Itwould create some inconsistent state between Windows' locale settingswhich may cause some unexpected behavior in applications.


> The "Use Unicode UTF-8 for worldwide language support" is in beta state
> for a very long time and for several major versions of Windows and
> Microsoft doesn't seem to have any plan to make it production ready and
> enabled by default.

I cannot speak for Microsoft, but they seem to have switched thedirection to apart from -W api to -A api with UTF-8 code page, andreccommend setting the codepage to UTF-8 for Unix like applications:

https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis

---

Use UTF-8 character encoding for optimal compatibility between web appsand other *nix-based platforms (Unix, Linux, and variants), minimizelocalization bugs, and reduce testing overhead.

---

For these two reasons, I am not sure it's worth enhancing the charset tosupport LC_* environment values on Windows.


Naoto


On 8/20/24 5:59 AM, Rostislav Krasny wrote:

Hello,
I'm the original author of the JDK-8337077 bug report. I reported inthrough your web site and have no account to comment it in the https://bugs.openjdk.org/browse/JDK-8337077 <https://bugs.openjdk.org/browse/JDK-8337077>
This bug report was closed by Naoto Sato as "Not an Issue" about a monthago without any discussion. I disagree with the closing reasons Naotohas written in his comment in that bug report.
The "Use Unicode UTF-8 for worldwide language support" is in beta statefor a very long time and for several major versions of Windows andMicrosoft doesn't seem to have any plan to make it production ready andenabled by default. Also this Windows capability has nothing in commonwith my bug report and could be used as a workaround only.
When you enable that beta UTF-8 support you enable it in the windowsconsole and not in the MINGW console. The MINGW console supports UTF-8by default regardless of that option.
The right solution/fix should be as following:
1. JRE should check the OSTYPE environment variable to identify that itis running inside an MSYS2 console/environment.2. In case the OSTYPE equals "msys" JVM is running under MSYS2 and theright encoding of the current console should be retrieved from the LC_*environment variables, for example from LC_CTYPE. In my caseLC_CTYPE="en_GB.UTF-8" meaning the right encoding is UTF-8.3. That retrieved encoding should be used during initialization of bothSystem.out and System.err instead of the usually different encoding thatis reported by Windows directly.
Currently JVM uses a not relevant (in case of MINGW console) method ofconsole encoding identification. In most cases it brings wrong encodingfrom a not related Windows configuration.
Please reopen the JDK-8337077 bug report and make a real fix, i.e. addsupport for MSYS2/MINGW consoles.
I'm almost sure there is the same issue when Cygwin is used. In the caseof Cygwin you should check that OSTYPE equals "cygwin" and the rest isthe same.
By default Windows has no OSTYPE environment variable defined.
MSYS2 also defines an MSYSTEM environment variable that identifies asud-type of MSYS2 (MINGW32, MINGW64, UCRT64, MSYS2, etc.) but theconsole is the same and configured similarly in all sub-types of MSYS2.Windows itself also doesn't have the MSYSTEM environment variabledefined by default and MSYS2 always defines both OSTYPE and MSYSTEM byitself.

Re: RFR: 8337077: Java uses wrong Charset in System.out when running on MINGW

Reply via email to