On 11-May-20 10:59, Alan Bateman wrote:
On 10/05/2020 15:47, Johannes Kuhn wrote:
:
After the discussion with Naoto, I would like to change the one line
to strcpy(ret + 2, "1252")
diff --git a/src/java.base/windows/native/libjava/java_props_md.c
b/src/java.base/windows/native/libjava/java_props_md.c
--- a/src/java.base/windows/native/libjava/java_props_md.c
+++ b/src/java.base/windows/native/libjava/java_props_md.c
@@ -73,6 +73,7 @@
LOCALE_IDEFAULTANSICODEPAGE,
ret+2, 14) == 0) {
codepage = 1252;
+ strcpy(ret + 2, "1252");
} else {
codepage = atoi(ret+2);
}
I have already signed the OCA.
This would be my first patch. Is there anything else you need?
I will take a further look into GetLocaleInfo, and try to find a
model that works for me (currently thinking about blackbox, can
return anything from [1]),
and then discussing how to handle all those cases.
It's not clear how conditions are necessary for GetLocaleInfo to fail
but good to fix this code path. We should probably create a new issue
in JIRA for this as it's not clear that it is the same issue as
JDK-8226810. We've been chasing JDK-8226810 for some time but have not
been able to find a system where it duplicates. The reports so far
suggests is Windows Chinese but most of the people that report the
issue to not respond to questions so we can't be sure. One of the
reports (see my comment in the bug from June 2019) suggests the code
page is 54936, useful information as GB18030 is not in java.base. Once
your fix is in then I think we should try to improve the exception
message rather than NPE. There may be some of these cases where it
should default to UTF-8.
-Alan.
I agree.
Looking through the JDK code, the IMHO best way to create a hook for
this is Charset.defaultCharset().
Instead of using GetPropertyAction, it could retrieve
System.getProperties and do a null check there. This has probably the
least impact on performance as this code path is only executed once.
Indeed, it could (theoretically) emit a warning/debug message there with
debug information and return sun.nio.cs.UTF_8.INSTANCE as fallback
without setting defaultCharset.
But that is a hard problem. I don't really like adding a warning to java
startup, even if it would not work before just to get debug information.
On the other hand, just continuing as if nothing did happen could hide
further bugs. (Also, how would I get the debug info from the Java side,
as properties are not yet initialized.)
I wrote a small C program which just outputs the result of
GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...).
I was not able to influence that result - on my machine it's always 1252.
My current understanding of codepages is basically: They are like encodings.
They come in two variants (?): ANSI and non-ANSI. The ansi codepages use
ANSI for bytes between 0 and 127.
Each system, application or thread (wtf? see CP_THREAD_ACP) can have a
different current codepage.
Some windows API functions (the *A variants) use the current codepage
(as returned by GetACP) to translate the string from/to unicode.
GetACP returns the codepage identifier for the operating system. (No
mention of thread in the doc for that, so why is there even CP_THREAD_ACP?)
It's a rabbit hole.
If someone knows about a way to influence the result of
GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...),
then at least reproducing the error would become possible.
- Johannes