Hello Bernd!

Thank you for your comments!

I'm going to proceed with only the restriction part of the change for now, so no blind conversion of lower-case control chars will happen.

A system property will allow the users to return to the previous less restrictive behavior, should they decide to keep malformed patterns unchanged.

I'll post the updated webrev and CSR request shortly.

With kind regards,
Ivan


On 9/4/19 10:54 PM, Bernd Eckenfels wrote:
Hallo,

Since not all combinations make sense (Exception+convert) a multi value might 
be better:

jdk.regex.control=WARN|EXCEPTION|STANDARD|LEGACY

With Exception generating an error, Standard beeing the planned new default 
(treating upper/lower same and error on all undefined chars) and legacy beeing 
the manual fallback to current behavior and WARN the same fallback but with 
logging.

I guess some form of early feedback like EXCPETION or WARN is needed, even when 
it is between a rock and a hard place. Maybe have at least one iteration where 
it defaults to LEGACY (+Release Notes announcement), then WARN and then finally 
STANDARD?

Gruss
Bernd


--
http://bernd.eckenfels.net

________________________________
Von: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> im Auftrag von Ivan 
Gerasimov <ivan.gerasi...@oracle.com>
Gesendet: Donnerstag, September 5, 2019 4:00 AM
An: Martin Buchholz; Stuart Marks
Cc: core-libs-dev
Betreff: Re: RFR 8230365 : Pattern for a control-char matches non-control 
characters

Thank you Martin!

On 8/30/19 6:19 PM, Martin Buchholz wrote:
There's a strong expectation that ctrl-A and ctrl-a both map to
\u0001, so I support Ivan's initiative.
I'm surprised java regex gets this wrong.
Might need a transitional system property.

Right.  I think it would be best to introduce two system properties:

The first, to turn on/off the restrictions on the control-char names.
This will be enabled by default, and will permit names from the limited
list: capital letters and a few other special characters.

The second one, to enable mapping of lower-case control-char names to
their upper-case counterpart.  This option should be turned off by
default for the current release of JDK, and then turned on by default
for some subsequent release (when, presumably, most applications that
use this kind of regexp are fixed).

This all feels like a little bit too much for such a rarely used
feature, but probably is a proper thing to do anyway :-)

If we have an agreement on these system properties, I can create a
separate test to verify all possible combinations.


What's the best bit-twiddle?  Untested:
if ((c ^= 0x40) < 0x20) return c;
if ((c ^=0x20)  <= 26 && c > 0) return c;

0x40 is more readable than 64.

`((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case)
ids.

Does ctrol-? get mapped to 0x7f ?

Yes. I've got it in the test at the end of the line 4997.

Would you please help review the updated webrev:

http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/

With kind regards,

Ivan



--
With kind regards,
Ivan Gerasimov

Reply via email to