On 11-May-20 10:59, Alan Bateman wrote:


On 10/05/2020 15:47, Johannes Kuhn wrote:
:

After the discussion with Naoto, I would like to change the one line to strcpy(ret + 2, "1252")

diff --git a/src/java.base/windows/native/libjava/java_props_md.c b/src/java.base/windows/native/libjava/java_props_md.c
--- a/src/java.base/windows/native/libjava/java_props_md.c
+++ b/src/java.base/windows/native/libjava/java_props_md.c
@@ -73,6 +73,7 @@
                       LOCALE_IDEFAULTANSICODEPAGE,
                       ret+2, 14) == 0) {
         codepage = 1252;
+        strcpy(ret + 2, "1252");
     } else {
         codepage = atoi(ret+2);
     }

I have already signed the OCA.
This would be my first patch. Is there anything else you need?

I will take a further look into GetLocaleInfo, and try to find a model that works for me (currently thinking about blackbox, can return anything from [1]),
and then discussing how to handle all those cases.
It's not clear how conditions are necessary for GetLocaleInfo to fail but good to fix this code path. We should probably create a new issue in JIRA for this as it's not clear that it is the same issue as JDK-8226810. We've been chasing JDK-8226810 for some time but have not been able to find a system where it duplicates. The reports so far suggests is Windows Chinese but most of the people that report the issue to not respond to questions so we can't be sure. One of the reports (see my comment in the bug from June 2019) suggests the code page is 54936, useful information as GB18030 is not in java.base. Once your fix is in then I think we should try to improve the exception message rather than NPE. There may be some of these cases where it should default to UTF-8.

-Alan.

I agree.
Looking through the JDK code, the IMHO best way to create a hook for this is Charset.defaultCharset(). Instead of using GetPropertyAction, it could retrieve System.getProperties and do a null check there. This has probably the least impact on performance as this code path is only executed once. Indeed, it could (theoretically) emit a warning/debug message there with debug information and return sun.nio.cs.UTF_8.INSTANCE as fallback without setting defaultCharset.

But that is a hard problem. I don't really like adding a warning to java startup, even if it would not work before just to get debug information. On the other hand, just continuing as if nothing did happen could hide further bugs. (Also, how would I get the debug info from the Java side, as properties are not yet initialized.)

I wrote a small C program which just outputs the result of GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...).
I was not able to influence that result - on my machine it's always 1252.

My current understanding of codepages is basically: They are like encodings.
They come in two variants (?): ANSI and non-ANSI. The ansi codepages use ANSI for bytes between 0 and 127. Each system, application or thread (wtf? see CP_THREAD_ACP) can have a different current codepage. Some windows API functions (the *A variants) use the current codepage (as returned by GetACP) to translate the string from/to unicode. GetACP returns the codepage identifier for the operating system. (No mention of thread in the doc for that, so why is there even CP_THREAD_ACP?)

It's a rabbit hole.
If someone knows about a way to influence the result of GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...), then at least reproducing the error would become possible.

- Johannes

Reply via email to