Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

Jie Fu Thu, 30 Sep 2021 16:08:03 -0700

On Thu, 30 Sep 2021 17:53:39 GMT, Ioi Lam <[email protected]> wrote:

> The current limitations of the MethodMather class are:
> 
> [1] `parse_method_pattern(char*& line, ...)` requires `line` to be a 
> UTF8-encoded byte sequence. Essentially, this means that for 
> -XX:CompileCommand to support non-ASCII characters, the JVM (and the shell 
> that runs the JVM) must be using UTF8 character encoding.
> 
> Note that a "locale" contains 3 parts: language, country and character 
> encoding. For example,
> 
> * en_US.utf8 (English language, United States, UTF8 character encoding)
> * zh_CN.utf8 (Chinese language, China,  UTF8 character encoding)
> * zh_CN.gbk (Chinese language, China,  GBK character encoding)
> 
> The first two support non-ASCII characters in -XX:CompileCommand, but the 
> third one doesn't.
> 
> [2] MethodMather uses `sscanf`. It assumes that the JVM is running with UTF8 
> character encoding because
> 
> * It uses `char*` strings returned by `sscanf` to compare with the bytes 
> stored in Symbols. This requires `sscanf`  to return strings that are encoded 
> in UTF8, because Symbols stores the string with UTF8-encoded bytes.
> * It relies on range checking by `sscanf` to enforce the following 
> restrictions. However, these restrictions are given with the RANGE macro 
> which is UTF8 encoded bytes (and I suspect that these are incorrect when 
> handling multi-byte UTF8-encoded characters):
> 
> ```
> // '\0' and 0xf0-0xff are disallowed in constant string values
> // 0x20 ' ', 0x09 '\t' and, 0x2c ',' are used in the matching
> // 0x5b '[' and 0x5d ']' can not be used because of the matcher
> // 0x28 '(' and 0x29 ')' are used for the signature
> // 0x2e '.' is always replaced before the matching
> // 0x2f '/' is only used in the class name as package separator
> ```
> 
> ================================== Proposed solutions:
> 
> I don't think we can solve [1] easily. To handle non-ASCII characters that 
> are non encoded in UTF8, we need to call NewPlatformString() in 
> src/java.base/share/native/libjli/java.c. However, this executes Java code, 
> but -XX:CompileCommand needs to be processed before any Java code execution. 
> ==> Proposal: IGNORE it for now.
> 
> For [2], there are two distinct issues:
> 
> (a) The restriction checks are invalid when the JVM is running in an non-UTF8 
> encoding -- this is a moot point because we can't handle [1] anyway, so the 
> data given to sscanf() is already bad. => Proposal: IGNORE it for now
> 
> (b) VC++ compilation warning when methodMather.cpp is compiled in non-UTF8 
> environments
> 
> This is just a warning, and (I think .....) it doesn't change the object file 
> at all. I.e., the literal strings in methodMatcher.obj are exactly the same 
> as if methodMather.cpp is compiled under a UTF8 environment.
> 
> Proposal: use pragma to disable the warning. Assuming that my analysis for 
> [1] and (a) is correct, there's no reason to fix the sscanf code. Disabling 
> the warnings with pragma is the most painless and easiest way to handle this.
> 
> @DamonFool could you try this experiment:
> 
> * Implement the pragma and build two JDKs -- one in a Chinese Windows 
> environment, and another in an English Windows environment
> * run `strings methodMatcher.obj` and see if the output is identical
> * run the "CJK" test example in my previous comment, and see if you get 
> identical results with both JDKs
>   
>   * On Windows, you may need to do this to force the terminal to be using 
> UTF8 code page. See 
> https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line
> 
> (If this doesn't work, an alternative is to avoid using sscanf and write our 
> own parser).
> 
> Thanks


Thanks @iklam for your excellent analysis.

So HotSpot does support non-ASCII chars as names.
Then we shouldn't simply remove such non-ASCII code.

I will do your experiment next week.
This is because it's already our National Day week and I can't find an English 
Windows machine until next week.
I'll let you know the result as soon as possible.
Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5704

Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

Reply via email to