I documented the details on both JIRA issues: http://issues.apache.org/jira/browse/HARMONY-688 http://issues.apache.org/jira/browse/HARMONY-933 So, please mark these issues as non-bug-differences if needed.
Thanks, Anton On 10/12/06, Paulex Yang <[EMAIL PROTECTED]> wrote:
Anton Ivanov wrote: > The problem is in the RI. These failures are the RI bugs. > > The test failures on the RI you pointed out can be grouped into the two I guess you meant three ;-) > categories: Is category2, the supplemental character issue, included in the HARMONY-933? How about to document the details like below on that JIRA, and mark it as non-bug difference? > > 1. Canonical equivalence related. > > java.util.regex.PatternSyntaxException: Unclosed group near index 59 > (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac) > ^ > at java.util.regex.Pattern.error(Pattern.java:1650) > at java.util.regex.Pattern.accept(Pattern.java:1508) > at java.util.regex.Pattern.group0(Pattern.java:2460) > at java.util.regex.Pattern.sequence(Pattern.java:1715) > at java.util.regex.Pattern.expr(Pattern.java:1687) > at java.util.regex.Pattern.compile(Pattern.java:1397) > at java.util.regex.Pattern.<init>(Pattern.java:1124) > at java.util.regex.Pattern.compile(Pattern.java:840) > at > org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag( > PatternTest.java:1060) > > The RI fails to compile the following pattern with CANON_EQ flag > specified: > "\u01E0\u00CCcdb(ac)" > This is due to the RI tries to build alternations to take into account > canonical equivalence. > And the RI does so in simple cases. But if pattern is a little more > complex the RI fails to compile it. > So the RI builds these alternations wrong. > You can see the following bug: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170 > > I wrote about these test failures on the RI here: > http://issues.apache.org/jira/browse/HARMONY-933 > > 2. Supplementary Unicode codepoints related. > > For example let's see at: > > testPredefinedClassesWithSurrogatesSupplementary > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:47) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertFalse(Assert.java:34) > at junit.framework.Assert.assertFalse(Assert.java:41) > at > org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary > > (PatternTest.java:1477) > > Here we try to find surrogate character in a codepoint \uD916\uDE27. > It is written here: > http://www.unicode.org/reports/tr18/#Supplementary_Characters > > "Surrogate pairs (or their equivalents in other encoding forms) are be > handled internally as single code point values" > > So we have to treat text as code points not code units. > Here \uD916\uDE27 is a one code point consisting of > two code units (two surrogate characters) so we find nothing. > (I added a comment with this explanation to the > testPredefinedClassesWithSurrogatesSupplementary()). > But the RI doesn't treat this codepoint as a single whole, this is the RI > bug > and this is wrong according to the technical report. > > 3. Error messages > java.util.regex.PatternSyntaxException: unmatched ) near index: 1 > b)a > ^ > java.util.regex.PatternSyntaxException: unmatched ) near index: 4 > bcde)a > ^ > java.util.regex.PatternSyntaxException: unmatched ) near index: 5 > bbg())a > ^ > java.util.regex.PatternSyntaxException: unmatched ) near index: 7 > cdb(?i))a > ^ > are printed in the testCompileStringint(). > This test is needed to verify that appropriate exceptions are thrown > if we compile a wrong builded regular expression. > > Thanks, > Anton > > On 10/12/06, Spark Shen <[EMAIL PROTECTED]> wrote: >> >> Anton Ivanov 写道: >> > On 10/10/06, Anton Ivanov <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> >> >> On 10/10/06, Tim Ellison <[EMAIL PROTECTED]> wrote: >> >> > >> >> > So I checked in a patch for HARMONY-688's regex fix, and it passed >> the >> >> > regex unit tests, but causes the existing luni tests to fail in >> >> > java.util.Scanner. I've not figured out the base cause of the >> failure >> >> > so I've backed out the changes. >> >> > >> >> > Regards, >> >> > Tim >> >> > >> >> > -- >> >> > >> >> > Tim Ellison ([EMAIL PROTECTED] ) >> >> > IBM Java technology centre, UK. >> >> > >> >> > >> --------------------------------------------------------------------- >> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html >> >> > To unsubscribe, e-mail: >> [EMAIL PROTECTED] >> >> > For additional commands, e-mail: >> [EMAIL PROTECTED] >> >> >> >> >> >> >> >> >> >> >> >> This is my patch. >> >> I'll look into this problem and try to correct the patch. >> >> >> >> Thanks, >> >> Anton >> >> >> > There was a bug in the newly created class SupplRangeSet.java. >> > There was the following code in the method matches() of >> > SupplRangeSet.java: >> > ... >> > if (stringIndex < strLength) { >> > char high = testString.charAt(stringIndex++); >> > >> > if (contains(high) && >> > next.matches(stringIndex, testString, matchResult) > 0) >> > { >> > return 1; >> > } >> > ... >> > But it is wrong simply to return 1, though we can read about method >> > matches() in AbstractSet.java comments: >> > >> > "Checks if this node matches in given position and recursively call >> > next node matches on positive self match. Returns positive integer if >> > entire match succeed, negative otherwise >> > return -1 if match fails or n > 0;" >> > In fact method matches() returns not only a positive n > 0. The n >> is an >> > offset in case of a positive >> > match attempt. This fact is took into account in all old classes of >> > java.util.regex, but I forgot this fact in SupplRangeSet.java >> > So I corrected method matches() of the SupplRangeSet class as follows: >> > ... >> > int offset = -1; >> > if (stringIndex < strLength) { >> > char high = testString.charAt(stringIndex++); >> > >> > if (contains(high) && >> > (offset = next.matches(stringIndex, testString, >> > matchResult)) > 0) { >> > return offset; >> > } >> > ... >> > I corrected the patch and attached it to the issue. >> > I verified that regex and luni tests pass normally with the patch >> > applied. >> > >> > Thanks, >> > Anton >> > >> Hi Anton: >> It must be very excited to handle such a complex problem. :-) >> >> But after applying the new patch (and test patch applied), I still got >> problems: >> Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4 >> test methods fail on RI: >> testCanonEqFlag: >> java.util.regex.PatternSyntaxException: Unclosed group near index 59 >> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac) >> ^ >> at java.util.regex.Pattern.error(Pattern.java:1650) >> at java.util.regex.Pattern.accept(Pattern.java:1508) >> at java.util.regex.Pattern.group0(Pattern.java:2460) >> at java.util.regex.Pattern.sequence(Pattern.java:1715) >> at java.util.regex.Pattern.expr(Pattern.java:1687) >> at java.util.regex.Pattern.compile(Pattern.java:1397) >> at java.util.regex.Pattern.<init>(Pattern.java:1124) >> at java.util.regex.Pattern.compile(Pattern.java:840) >> at >> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag( >> PatternTest.java:1060) >> >> testIndexesCanonicalEq: >> junit.framework.AssertionFailedError: null >> at junit.framework.Assert.fail(Assert.java:47) >> at junit.framework.Assert.assertTrue(Assert.java:20) >> at junit.framework.Assert.assertTrue(Assert.java:27) >> at >> >> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq >> >> (PatternTest.java:1247) >> >> testCanonEqFlagWithSupplementaryCharacters: >> junit.framework.AssertionFailedError: null >> at junit.framework.Assert.fail(Assert.java:47) >> at junit.framework.Assert.assertTrue(Assert.java:20) >> at junit.framework.Assert.assertTrue(Assert.java:27) >> at >> >> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters >> >> (PatternTest.java:1275) >> >> testPredefinedClassesWithSurrogatesSupplementary >> junit.framework.AssertionFailedError: null >> at junit.framework.Assert.fail(Assert.java:47) >> at junit.framework.Assert.assertTrue(Assert.java:20) >> at junit.framework.Assert.assertFalse(Assert.java:34) >> at junit.framework.Assert.assertFalse(Assert.java:41) >> at >> >> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary >> >> (PatternTest.java:1477) >> If they are the bugs of RI, shall we add comments for them in the test >> case? >> >> and Error message printed out on console on Harmony. Since there are >> test cases use System.out instead of assert, I could not locate where >> these error message comes from: >> java.util.regex.PatternSyntaxException: unmatched ) near index: 1 >> b)a >> ^ >> java.util.regex.PatternSyntaxException: unmatched ) near index: 4 >> bcde)a >> ^ >> java.util.regex.PatternSyntaxException: unmatched ) near index: 5 >> bbg())a >> ^ >> java.util.regex.PatternSyntaxException: unmatched ) near index: 7 >> cdb(?i))a >> ^ >> And last, the good news is luni tests do pass. :-) >> >> Best regards >> >> -- >> Spark Shen >> China Software Development Lab, IBM >> >> >> --------------------------------------------------------------------- >> Terms of use : http://incubator.apache.org/harmony/mailing.html >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> -- Paulex Yang China Software Development Lab IBM --------------------------------------------------------------------- Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]