Thank you Martin again!
Here's the updated webrev without the lower-case control char ids:
http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/
I've also filed a CSR to record the changes in bahavior:
https://bugs.openjdk.java.net/browse/JDK-8230675
Could you please help review it?
On 9/4/19 9:00 PM, Martin Buchholz wrote:
Thanks, Ivan. We're mostly in agreement.
+ * If {@code true} then lower-case control-character ids are mapped to the
+ * their upper-case counterparts.
Extra "the".
After all these decades I only now realize that c ^= 0x40 moves '?' to
the end of the ASCII range and all the other controls to the start!
Should we support lower-case controls? Compatibility with perl regex
still matters, but a lot less than in 2003. But the key is that we
got the WRONG ANSWER previously, so when we restrict the control ids
let's just make lower case controls syntax errors. Silently changing
behavior is bad for users. ... so let's abandon
ALLOW_LOWERCASE_CONTROL_CHAR_IDS.
An alternative:
int ch = read() ^ 0x40;
if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch;
This code will probably be most efficient for the common case.
However, I'd prefer to use the auxiliary method isCntrlId() in this
case, as it is self-documenting and still efficient enough.
With kind regards,
Ivan