After the introduction of compact strings in Java 9, the current String may store byte arrays encoded as Latin1 or UTF-16. Here's a troublesome thing: Latin1 is not compatible with UTF-8. Not all Latin1 byte strings are legal UTF-8 byte strings: They are only compatible within the ASCII range, when there are code points greater than 127, Latin1 uses a one-byte representation, while UTF-8 requires two bytes.
As an example, every time `JavaLangAccess::getBytesNoRepl` is called to convert a string to a UTF-8 array, the internal implementation needs to call `StringCoding.hasNegatives` to scan the content byte array to determine that the string can be encoded in ASCII, and thus eliminate array copies. Similar steps are performed when calling `str.getBytes(UTF_8)`. So, is it possible to introduce a third possible value for `String::coder`: ASCII? This looks like an attractive option, and if we do this, we can introduce fast paths for many methods. Of course, I know that this change is not completely free, and some methods may bring slight performance degradation due to the need to judge the coder, in particular, there may be an impact on the performance of the StringBuilder/StringBuffer. However, given that UTF-8 is by far the most commonly used file encoding, the performance benefits of fast paths are likely to cover more scenarios. In addition to this, other ASCII compatible encodings may also benefit, such as GB18030(GBK), or ISO 8859 variants. And if frozen arrays were introduced into the JDK, there would be more scenarios to enjoy performance improvements. So I would like to ask, is it possible for JDK to improve String storage in a similar way in the future? Has anyone explored this issue before? Sorry to bother you all, but I'm very much looking forward to the answer to this question.