Re: [jira] Commented: (HARMONY-688) java.util.regex.Matcher does not support Unicode supplementary characters

Richard Liang Thu, 29 Jun 2006 02:22:52 -0700


Nikolay Kuznetsov (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HARMONY-688?page=comments#action_12418290 ]

Nikolay Kuznetsov commented on HARMONY-688:
-------------------------------------------

Yes, we do not support supplementary characters. The main reason for this was 
that such a support breaks quantifiers optimizations over character classes of 
fixed length(we support 1:-)). Now I think that I can support two different 
types of character classes: one for fixed with 1(2), second for unknown(1 or 2, 
\\p{javaLowerCase}, for instance).

Great! Now I'm eager for this function. Thanks a lot. ;-)

BTW, am I right that if we do not take into account unicode normalization support this problem affects only character classes and ranges behaviour?

Yes, I think so.

In all the other cases it's impossible to construct such a pattern which will 
work incorrectly, if not could you please give me an example.

I'm not sure. At least, I cannot give the example. ;-)

Thanks.
   Nik.

java.util.regex.Matcher does not support Unicode supplementary characters
-------------------------------------------------------------------------

         Key: HARMONY-688
         URL: http://issues.apache.org/jira/browse/HARMONY-688
     Project: Harmony
        Type: Bug

  Components: Classlib
    Reporter: Richard Liang

Hello Nikolay,
The following test case pass on RI, but fail on Harmony.  Would you please have 
a look at this issue? Thanks a lot.
    public void test_matcher() {
        Pattern p = Pattern.compile("\\p{javaLowerCase}");
        Matcher matcher = p.matcher("\uD801\uDC28");
        assertTrue(matcher.find());
    }
Best regards,
Richard


--
Richard Liang

China Software Development Lab, IBM

Re: [jira] Commented: (HARMONY-688) java.util.regex.Matcher does not support Unicode supplementary characters

Reply via email to