This is an automated email from the ASF dual-hosted git repository.

skygo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/netbeans-native-launchers.git


The following commit(s) were added to refs/heads/master by this push:
     new 7d207b7  Make the Windows launcher work with Unicode paths
     new 51d230d  Merge pull request #7 from eirikbakke/unicodeLaunching
7d207b7 is described below

commit 7d207b7af59fa68cd3f049b7832f31feffbbd327
Author: Eirik Bakke <[email protected]>
AuthorDate: Mon Nov 28 15:37:53 2022 -0500

    Make the Windows launcher work with Unicode paths
    
    On Windows, the `C:\Users\<username>` path will often contain the user's 
real human name. When this path contains characters outside the current "ANSI 
code page" (an old DOS concept that predates Unicode), lots of different things 
break, and NetBeans will fail to work. See 
https://github.com/apache/netbeans/issues/4314.
    
    The underlying problem is that both OpenJDK and the NetBeans launcher, like 
most old Windows software, tend to use the "ANSI" versions of Win32 APIs rather 
than the "Wide Char" versions (e.g. GetCurrentDirectoryA instead of 
GetCurrentDirectoryW). About 3 years ago, Microsoft started recommending an 
easy solution: Rather than replacing "char *" with "wchar_t *" everywhere, we 
can now set UTF-8 as the default "code page" for all ANSI Win32 API calls, at 
the EXE file process level. We do [...]
    
    Since the NetBeans Windows launcher uses JNI to start Java in-process, the 
JVM inherits the same default code page setting, without the user having to 
change regional settings in the Control Panel. Everything then works 
consistently; passing of system properties into the JVM, loading of libraries 
from Unicode paths from the system classloader or JNA, and so on. This assumes 
that the JVM takes its Win32 code page setting from the "user" locale (the 
GetACP() Win32 function) rather than  [...]
    
    We also set the UTF-8 code page for the windows console, so that Unicode 
characters display correctly there.
    
    On the aforementioned recent Java versions, NetBeans should now run fine 
when there are Unicode characters in the NetBeans installation path, the JDK 
path, the user/cache directory paths, or in the java.io.tmpdir path (the latter 
sometimes being a problem for JNA, which is used by FlatLAF). This was tested 
on Java 17.0.5 with Cyrillic and Norwegian characters in the OS home directory 
path and all of the paths above, with different combinations of Cyrillic vs. US 
English code pages set [...]
---
 src/main/cpp/bootstrap/nbexec.exe.manifest |  6 +++
 src/main/cpp/bootstrap/utilsfuncs.cpp      | 15 ++++++
 src/main/cpp/harness/app.exe.manifest      |  6 +++
 src/main/cpp/ide/nblauncher.cpp            | 73 +++++++++++++-----------------
 src/main/cpp/ide/netbeans.exe.manifest     |  6 +++
 src/main/cpp/ide/netbeans64.exe.manifest   |  6 +++
 6 files changed, 70 insertions(+), 42 deletions(-)

diff --git a/src/main/cpp/bootstrap/nbexec.exe.manifest 
b/src/main/cpp/bootstrap/nbexec.exe.manifest
index cfc9190..580bb41 100644
--- a/src/main/cpp/bootstrap/nbexec.exe.manifest
+++ b/src/main/cpp/bootstrap/nbexec.exe.manifest
@@ -48,6 +48,12 @@
       </requestedPrivileges>
      </security>
 </trustInfo>
+<!-- See 
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
 -->
+<application>
+  <windowsSettings>
+    <activeCodePage 
xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings";>UTF-8</activeCodePage>
+  </windowsSettings>
+</application>
 <!-- NETBEANS-1227: Indicate the same HiDPI capabilities as javaw.exe from JDK 
11. -->
 <asmv3:application xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
   <asmv3:windowsSettings 
xmlns:dpi1="http://schemas.microsoft.com/SMI/2005/WindowsSettings"; 
xmlns:dpi2="http://schemas.microsoft.com/SMI/2016/WindowsSettings";>
diff --git a/src/main/cpp/bootstrap/utilsfuncs.cpp 
b/src/main/cpp/bootstrap/utilsfuncs.cpp
index 2902b1e..16c6ce0 100644
--- a/src/main/cpp/bootstrap/utilsfuncs.cpp
+++ b/src/main/cpp/bootstrap/utilsfuncs.cpp
@@ -276,6 +276,19 @@ bool checkLoggingArg(int argc, char *argv[], bool delFile) 
{
     return true;
 }
 
+void setConsoleCodepage() {
+    /* The Windows console (cmd) has its own code page setting that's usually 
different from the
+    system and user code page, e.g. on US Windows the console will use code 
page 437 while the
+    rest of the system uses 1252. Setting the console code page here to UTF-8 
makes Unicode
+    characters printed from the application appear correctly. Since the 
launcher itself also runs
+    with UTF-8 as its code page (specified in the application manifest), this 
also makes log
+    messages from the launchers appear correctly, e.g. when printing paths 
that may have Unicode
+    characters in them. Note that if we attached to an existing console, the 
modified code page
+    setting will persist after the launcher exits. */
+    SetConsoleOutputCP(CP_UTF8);
+    SetConsoleCP(CP_UTF8);
+}
+
 bool setupProcess(int &argc, char *argv[], DWORD &parentProcID, const char 
*attachMsg) {
 #define CHECK_ARG \
     if (i+1 == argc) {\
@@ -290,6 +303,7 @@ bool setupProcess(int &argc, char *argv[], DWORD 
&parentProcID, const char *atta
             CHECK_ARG;
             if (strcmp("new", argv[i + 1]) == 0){
                 AllocConsole();
+                setConsoleCodepage();
             } else if (strcmp("suppress", argv[i + 1]) == 0) {
                 // nothing, no console should be attached
             } else {
@@ -332,6 +346,7 @@ bool setupProcess(int &argc, char *argv[], DWORD 
&parentProcID, const char *atta
                     logErr(true, false, "AttachConsole of PP failed.");
                 } else {
                     getParentProcessID(parentProcID);
+                    setConsoleCodepage();
                     if (attachMsg) {
                         printToConsole(attachMsg);
                     }
diff --git a/src/main/cpp/harness/app.exe.manifest 
b/src/main/cpp/harness/app.exe.manifest
index 26921b3..c1843b2 100644
--- a/src/main/cpp/harness/app.exe.manifest
+++ b/src/main/cpp/harness/app.exe.manifest
@@ -48,6 +48,12 @@
       </requestedPrivileges>
      </security>
 </trustInfo>
+<!-- See 
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
 -->
+<application>
+  <windowsSettings>
+    <activeCodePage 
xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings";>UTF-8</activeCodePage>
+  </windowsSettings>
+</application>
 <!-- NETBEANS-1227: Indicate the same HiDPI capabilities as javaw.exe from JDK 
11. -->
 <asmv3:application xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
   <asmv3:windowsSettings 
xmlns:dpi1="http://schemas.microsoft.com/SMI/2005/WindowsSettings"; 
xmlns:dpi2="http://schemas.microsoft.com/SMI/2016/WindowsSettings";>
diff --git a/src/main/cpp/ide/nblauncher.cpp b/src/main/cpp/ide/nblauncher.cpp
index 2bd940e..393e2cd 100644
--- a/src/main/cpp/ide/nblauncher.cpp
+++ b/src/main/cpp/ide/nblauncher.cpp
@@ -25,6 +25,7 @@
 #endif
 
 #include <shlobj.h>
+#include <winnls.h>
 #include "nblauncher.h"
 #include "../bootstrap/utilsfuncs.h"
 #include "../bootstrap/argnames.h"
@@ -157,6 +158,20 @@ int NbLauncher::start(int argc, char *argv[]) {
     return loader.start(nbexecPath.c_str(), newArgs.getCount(), 
newArgs.getArgs());
 }
 
+UINT GetAnsiCodePageForLocale(LCID lcid) {
+    // See https://devblogs.microsoft.com/oldnewthing/20161007-00/?p=94475
+    UINT acp;
+    int sizeInChars = sizeof(acp) / sizeof(TCHAR);
+    if (GetLocaleInfo(lcid,
+                      LOCALE_IDEFAULTANSICODEPAGE | LOCALE_RETURN_NUMBER,
+                      reinterpret_cast<LPTSTR>(&acp),
+                      sizeInChars) != sizeInChars)
+    {
+        return 0;
+    }
+    return acp;
+}
+
 bool NbLauncher::initBaseNames() {
     char path[MAX_PATH] = "";
     getCurrentModulePath(path, MAX_PATH);
@@ -181,49 +196,23 @@ bool NbLauncher::initBaseNames() {
     }
     *bslash = '\0';        
 
+    /* Useful messages for debugging character set issues. On Java versions 
where
+    https://bugs.openjdk.org/browse/JDK-8272352 has been fixed, NetBeans 
should now run fine when
+    there are Unicode characters in the NetBeans installation path, the JDK 
path, the user/cache
+    directory paths, or in the java.io.tmpdir path (the latter sometimes being 
a problem for JNA,
+    which is used by FlatLAF). Since the JVM is started in-process via JNI, 
the Java environment
+    will inherit the UTF-8 code page setting that we have set in the 
launcher's application
+    manifest, without requiring the user to change regional settings in the 
Control Panel. (JEP 400
+    might eventually do something similar for the java.exe/javaw.exe 
executables. See
+    https://www.mail-archive.com/[email protected]/msg80489.html 
.) */
+    logMsg("ANSI code page per GetACP()              : %d", GetACP());
+    logMsg("ANSI code page per GetConsoleCP()        : %d", GetConsoleCP());
+    logMsg("ANSI code page for GetThreadLocale()     : %d", 
GetAnsiCodePageForLocale(GetThreadLocale()));
+    logMsg("ANSI code page for GetUserDefaultLCID()  : %d", 
GetAnsiCodePageForLocale(GetUserDefaultLCID()));
+    logMsg("ANSI code page for GetSystemDefaultLCID(): %d", 
GetAnsiCodePageForLocale(GetSystemDefaultLCID()));
+
     baseDir = path;
-    
-    /* The JavaVMOption.optionString interface forces us to stick to ANSI
-    strings only, using whichever codepage is the default on the current 
Windows
-    installation (e.g. windows-1252 for US Windows). For any Unicode characters
-    that cannot be encoded using the current ANSI codepage, Win32 functions
-    such as GetModuleFileName (used by getCurrentModulePath) and
-    GetCurrentDirectory will substitute a question mark, which we detect here.
-    Note that the ANSI codepage is a superset of ASCII; it can accomodate a
-    limited selection of international characters that Microsoft once 
considered
-    appropriate for the current Windows locale.
-
-    It would be easy enough to switch the launcher process to UTF-8 everywhere;
-    this can be configured from the manifest file
-    (see 
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
 ).
-    String types in these sources could remain as "char *" rather than
-    wchar_t. The problem is that JNI will still seems to expect parameters to 
be
-    passed using the default Windows codepage.
-
-    I tried setting UTF8 in the manifests and using the --fork-java parameter
-    to use the old CreateProcess launcher rather than JNI, but this still
-    causes a "Could not find or load main class org.netbeans.Main" error. I
-    also tried doing MultiByteToWideChar from UTF8 to wchar_t and calling
-    CreateProcessW; it does not fix the problem even though changing the 
command
-    line to prefix "cmd /c echo" causes my Cyrillic test character to show up
-    correctly on the Windows command line.
-
-    Other approaches which were attempted, but demeed too fragile:
-    1) Set the current directory to baseDir and pass relative paths only.
-       (Still led to ClassNotFoundException from ProxyClassLoader, which would
-       have needed to be fixed. And doesn't work e.g. for the home directory,
-       e.g. if the username itself has problematic characters in it.)
-    2) Using the GetShortPathNameW function to get an equivalent
-       Windows 95 style "8.3" compatibility path (e.g. "C:\Users\CHARTE~1").
-       This worked, but is too likely to create problems down the line.
-    */
-    for (size_t i = 0; i < baseDir.size(); ++i) {
-        if (baseDir[i] == '?') {
-            logErr(false, true, "Cannot run in this folder; the path \"%s\" 
contains problematic characters.", path);
-            return false;
-        }
-    }
-    
+
     logMsg("Base dir: %s", baseDir.c_str());
     return true;
 }
diff --git a/src/main/cpp/ide/netbeans.exe.manifest 
b/src/main/cpp/ide/netbeans.exe.manifest
index 71b1164..2dda7fb 100644
--- a/src/main/cpp/ide/netbeans.exe.manifest
+++ b/src/main/cpp/ide/netbeans.exe.manifest
@@ -48,6 +48,12 @@
       </requestedPrivileges>
      </security>
 </trustInfo>
+<!-- See 
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
 -->
+<application>
+  <windowsSettings>
+    <activeCodePage 
xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings";>UTF-8</activeCodePage>
+  </windowsSettings>
+</application>
 <!-- NETBEANS-1227: Indicate the same HiDPI capabilities as javaw.exe from JDK 
11. -->
 <!-- Note that even 32-bit Java 10.0.2 indicates HiDPI-awareness, so it should
      be fine to include it here as well. -->
diff --git a/src/main/cpp/ide/netbeans64.exe.manifest 
b/src/main/cpp/ide/netbeans64.exe.manifest
index 3f7dc6e..b1d9a5f 100644
--- a/src/main/cpp/ide/netbeans64.exe.manifest
+++ b/src/main/cpp/ide/netbeans64.exe.manifest
@@ -50,6 +50,12 @@
       </requestedPrivileges>
      </security>
 </trustInfo>
+<!-- See 
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
 -->
+<application>
+  <windowsSettings>
+    <activeCodePage 
xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings";>UTF-8</activeCodePage>
+  </windowsSettings>
+</application>
 <!-- NETBEANS-1227: Indicate the same HiDPI capabilities as javaw.exe from JDK 
11. -->
 <asmv3:application xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
   <asmv3:windowsSettings 
xmlns:dpi1="http://schemas.microsoft.com/SMI/2005/WindowsSettings"; 
xmlns:dpi2="http://schemas.microsoft.com/SMI/2016/WindowsSettings";>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists

Reply via email to