Hello,

As I commented in the bug report, I closed the issue as "not an issue", as LC_* environment values on Windows has never been supported (or even considered) as a means to set locale/encoding as the way POSIX does. It would create some inconsistent state between Windows' locale settings which may cause some unexpected behavior in applications.

> The "Use Unicode UTF-8 for worldwide language support" is in beta state
> for a very long time and for several major versions of Windows and
> Microsoft doesn't seem to have any plan to make it production ready and
> enabled by default.

I cannot speak for Microsoft, but they seem to have switched the direction to apart from -W api to -A api with UTF-8 code page, and reccommend setting the codepage to UTF-8 for Unix like applications:
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis

---
Use UTF-8 character encoding for optimal compatibility between web apps and other *nix-based platforms (Unix, Linux, and variants), minimize localization bugs, and reduce testing overhead.
---

For these two reasons, I am not sure it's worth enhancing the charset to support LC_* environment values on Windows.

Naoto


On 8/20/24 5:59 AM, Rostislav Krasny wrote:
Hello,

I'm the original author of the JDK-8337077 bug report. I reported in through your web site and have no account to comment it in the https:// bugs.openjdk.org/browse/JDK-8337077 <https://bugs.openjdk.org/browse/ JDK-8337077>

This bug report was closed by Naoto Sato as "Not an Issue" about a month ago without any discussion. I disagree with the closing reasons Naoto has written in his comment in that bug report.

The "Use Unicode UTF-8 for worldwide language support" is in beta state for a very long time and for several major versions of Windows and Microsoft doesn't seem to have any plan to make it production ready and enabled by default. Also this Windows capability has nothing in common with my bug report and could be used as a workaround only.

When you enable that beta UTF-8 support you enable it in the windows console and not in the MINGW console. The MINGW console supports UTF-8 by default regardless of that option.

The right solution/fix should be as following:

1. JRE should check the OSTYPE environment variable to identify that it is running inside an MSYS2 console/environment. 2. In case the OSTYPE equals "msys" JVM is running under MSYS2  and the right encoding of the current console should be retrieved from the LC_* environment variables, for example from LC_CTYPE. In my case LC_CTYPE="en_GB.UTF-8" meaning the right encoding is UTF-8. 3. That retrieved encoding should be used during initialization of both System.out and System.err instead of the usually different encoding that is reported by Windows directly.

Currently JVM uses a not relevant (in case of MINGW console) method of console encoding identification. In most cases it brings wrong encoding from a not related Windows configuration.

Please reopen the JDK-8337077 bug report and make a real fix, i.e. add support for MSYS2/MINGW consoles.

I'm almost sure there is the same issue when Cygwin is used. In the case of Cygwin you should check that OSTYPE equals "cygwin" and the rest is the same.

By default Windows has no OSTYPE environment variable defined.

MSYS2 also defines an MSYSTEM environment variable that identifies a sud-type of MSYS2 (MINGW32, MINGW64, UCRT64, MSYS2, etc.) but the console is the same and configured similarly in all sub-types of MSYS2. Windows itself also doesn't have the MSYSTEM environment variable defined by default and MSYS2 always defines both OSTYPE and MSYSTEM by itself.

Reply via email to