The goal of the clause is to have a mechanism for using hex values for character literals. That is, you should be able to take a code point from 0 to 10FFFF, get a hex value for that, embed it in some syntax, and concatenate it into a pattern, and have it work as a literal.
For example: String pattern = first_part + "\\x{" + hex(myCodePoint) + "}" + second_part; // for *some* hex notation ... Matcher m = Pattern.compile(pattern, Pattern.COMMENTS).matcher(target); ... As far as I can tell, Java really doesn't supply that capability for non-BMP, because the \u notation doesn't work above FFFF, except insofar as the preprocessor maps a surrogate pair in hex to literals, which happen all to work because they aren't syntax characters. What you can do with Java is: 1. embed the character itself, not the hex representation, which works some of the time (fails for 18 characters; syntax characters, as expected). 2. in constant expressions only, utilize the Java preprocessor with \u.... or \u....\u....). 3. for BMP characters, use "\u" + hex(myCodePoint,4) Here is a quick and dirty test; let me know if I've missed something. *Output:* LITERALS Failures: 18 set: [\u0009-\u000D\ #\$(-+?\[\\\^\{|] example1: a b exampleN: a|b INLINE Failures: 1048576 set: [\U00010000-\U0010FFFF] example1: a\uD800\uDC00b exampleN: a\uDBFF\uDFFFb INRANGE Failures: 1048576 set: [\U00010000-\U0010FFFF] example1: a[\uD800\uDC00]b exampleN: a[\uDBFF\uDFFF]b *Code:* public void TestRegex() { logln("Check patterns for Unicodeset"); for (int i = 0; i <= 0x10FFFF; ++i) { // The goal is to make a regex with hex digits, and have it match the corresponding character // We check two different environments: inline ("aXb") and in a range ("a[X]b") String s = new StringBuilder().appendCodePoint(i).toString(); String hexPattern = i <= 0xFFFF ? "\\u" + Utility.hex(i,4) : "\\u" + Utility.hex(Character.toChars(i)[0],4) + "\\u"+ Utility.hex(Character.toChars(i)[1],4); String target = "a" + s + "b"; Failures.LITERALS.checkMatch(i, "a" + s + "b", target); Failures.INLINE.checkMatch(i, "a" + hexPattern + "b", target); Failures.INRANGE.checkMatch(i, "a[" + hexPattern + "]b", target); } Failures.LITERALS.showFailures(); Failures.INLINE.showFailures(); Failures.INRANGE.showFailures(); } enum Failures { LITERALS, INLINE, INRANGE; UnicodeSet failureSet = new UnicodeSet(); String firstSampleFailure; String lastSampleFailure; void checkMatch(int codePoint, String pattern, String target) { if (!matches(pattern, target)) { failureSet.add(codePoint); if (firstSampleFailure == null) { firstSampleFailure = pattern; } lastSampleFailure = pattern; } } boolean matches(String hexPattern, String target) { try { // use COMMENTS to get the 'worst case' return Pattern.compile(hexPattern, Pattern.COMMENTS ).matcher(target).matches(); } catch (Exception e) { return false; } } void showFailures() { System.out.format(this + " Failures: %s\n\tset: %s\n\texample1: %s\n\texampleN: %s\n", failureSet.size(), failureSet, firstSampleFailure, lastSampleFailure); } }