Hi.
Jon Stevens > Q: There are two tests that don't pass. Why?
Having got home and looked at the tests I think I did not answer the question
properly earlier. I assume you mean why should these not match?
Jon Stevens > #159
If so then #159 the www is preceded by a period.
The re requires that the the first character of the domain name be
alpanumeric or a hyphen.
http://.www.test.com
Jon Stevens > #161
The re only matchs ftp and http protocols. But not Fttp.
NOTE: The "Match: NO" means a successful non-matching.
A big...
*********************************************
************Failure*************************
*********************************************
type message appears if one of the tests "fails".
As to the test(); as opposed to new RETest( args );
I have included another patch to clean this up( this is a repeat of the
previous patch with additions).
I think someone intended to clean it up earlier but did not finish or was
distracted as the javadocs says one thing and the code does something other.
I think it on track now...
Michael
Index: docs/RETest.txt
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/docs/RETest.txt,v
retrieving revision 1.1
diff -r1.1 RETest.txt
886a887,980
>
> #149
> (?:a)
> a
> YES
> a
>
> #150
> (?:a)
> aa
> YES
> a
>
> #151
> (?:\w)
> abc
> YES
> a
>
> #152
> (?:\w\s\w)+
> a b c
> YES
> a b
>
> #153
> (a\w)(?:,(a\w))+
> ab,ac,ad
> YES
> ab,ac,ad
> ab
> ad
>
> #154
> z(\w\s+(?:\w\s+\w)+)z
> za b bc cd dz
> YES
> za b bc cd dz
> a b bc cd d
>
> #155
> (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*
> http://www.test.com
> YES
> http://www.test.com
> http://
> http
> .com
>
> #156
> ((?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*
> ftp://www.test.com
> YES
> ftp://www.test.com
> ftp://
> .com
>
> #157
> (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*
> htTp://www.test.com
> YES
> htTp://www.test.com
> htTp://
> htTp
>
> #158
> (?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*
> FTP://www.test.com
> YES
> FTP://www.test.com
> FTP
> .com
>
> #159
> ^(?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*$
> http://.www.test.com
> NO
>
> #160
> ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$
> FtP://www.test.com
> YES
> FtP://www.test.com
>
> #161
> ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$
> FtTP://www.test.com
> NO
>
> #162
> ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$
> www.test.com
> YES
> www.test.com
Index: src/java/org/apache/regexp/RE.java
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/src/java/org/apache/regexp/RE.java,v
retrieving revision 1.6
diff -r1.6 RE.java
176,186c176,186
< * [:alnum:] Alphanumeric characters.
< * [:alpha:] Alphabetic characters.
< * [:blank:] Space and tab characters.
< * [:cntrl:] Control characters.
< * [:digit:] Numeric characters.
< * [:graph:] Characters that are printable and are also visible. (A
space is printable, but not visible, while an `a' is both.)
< * [:lower:] Lower-case alphabetic characters.
< * [:print:] Printable characters (characters that are not control
characters.)
< * [:punct:] Punctuation characters (characters that are not letter,
digits, control characters, or space characters).
< * [:space:] Space characters (such as space, tab, and formfeed, to
name a few).
< * [:upper:] Upper-case alphabetic characters.
---
> * [:alnum:] Alphanumeric characters.
> * [:alpha:] Alphabetic characters.
> * [:blank:] Space and tab characters.
> * [:cntrl:] Control characters.
> * [:digit:] Numeric characters.
> * [:graph:] Characters that are printable and are also visible. (A
>space is printable, but not visible, while an `a' is both.)
> * [:lower:] Lower-case alphabetic characters.
> * [:print:] Printable characters (characters that are not control
>characters.)
> * [:punct:] Punctuation characters (characters that are not letter,
>digits, control characters, or space characters).
> * [:space:] Space characters (such as space, tab, and formfeed, to
>name a few).
> * [:upper:] Upper-case alphabetic characters.
188c188
< *
---
> *
199c199
< *
---
> *
254a255
> * (?:A) Used for subexpression clustering (just like grouping but
>no backrefs)
399a401
> static final char OP_OPEN_CLUSTER = '<'; // opening cluster
400a403
> static final char OP_CLOSE_CLUSTER = '>'; // closing cluster
421c424
< static final char POSIX_CLASS_ALPHA = 'a'; // Alphabetics
---
> static final char POSIX_CLASS_ALPHA = 'a'; // Alphabetics
947a951,955
>
> case OP_OPEN_CLUSTER:
> case OP_CLOSE_CLUSTER:
> // starting or ending the matching of a subexpression which has
>no backref.
> return matchNodes( next, maxNode, idx );
Index: src/java/org/apache/regexp/RECompiler.java
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/src/java/org/apache/regexp/RECompiler.java,v
retrieving revision 1.2
diff -r1.2 RECompiler.java
1191c1191
< boolean paren = false;
---
> int paren = -1;
1196,1198c1196,1208
< idx++;
< paren = true;
< ret = node(RE.OP_OPEN, parens++);
---
> // if its a cluster ( rather than a proper subexpression ie with
>backrefs )
> if ( idx + 2 < len && pattern.charAt( idx + 1 ) == '?' &&
>pattern.charAt( idx + 2 ) == ':' )
> {
> paren = 2;
> idx += 3;
> ret = node( RE.OP_OPEN_CLUSTER, 0 );
> }
> else
> {
> paren = 1;
> idx++;
> ret = node(RE.OP_OPEN, parens++);
> }
1223c1233
< if (paren)
---
> if ( paren > 0 )
1233c1243,1250
< end = node(RE.OP_CLOSE, closeParens);
---
> if ( paren == 1 )
> {
> end = node(RE.OP_CLOSE, closeParens);
> }
> else
> {
> end = node( RE.OP_CLOSE_CLUSTER, 0 );
> }
Index: src/java/org/apache/regexp/RETest.java
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/src/java/org/apache/regexp/RETest.java,v
retrieving revision 1.2
diff -r1.2 RETest.java
58c58
< */
---
> */
85c85
< public static void main(String[] arg)
---
> public static void main(String[] args)
89,90c89
< //new RETest(arg);
< test();
---
> test( args );
103c102
< public static boolean test() throws Exception
---
> public static boolean test( String[] args ) throws Exception
106c105,121
< test.runAutomatedTests("docs/RETest.txt");
---
> // Run interactive tests against a single regexp
> if (args.length == 2)
> {
> test.runInteractiveTests(args[1]);
> }
> else if (args.length == 1)
> {
> // Run automated tests
> test.runAutomatedTests(args[0]);
> }
> else
> {
> System.out.println( "Usage: RETest ([-i] [regex])
>([/path/to/testfile.txt])" );
> System.out.println( "By Default will run automated tests from file
>'docs/RETest.txt' ..." );
> System.out.println();
> test.runAutomatedTests("docs/RETest.txt");
> }
118,146d132
< * Constructor for test
< * @param arg Command line arguments
< */
< public RETest(String[] arg)
< {
< try
< {
< // Run interactive tests against a single regexp
< if (arg.length == 2)
< {
< runInteractiveTests(arg[1]);
< }
< else if (arg.length == 1)
< {
< // Run automated tests
< runAutomatedTests(arg[0]);
< }
< else
< {
< System.out.println ( "Usage: RETest ([-i] [regex])
([/path/to/testfile.txt])" );
< }
< }
< catch (Exception e)
< {
< e.printStackTrace();
< }
< }
<
< /**
162c148
<
---
>
Index: xdocs/RETest.txt
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/xdocs/RETest.txt,v
retrieving revision 1.1
diff -r1.1 RETest.txt
886a887,980
>
> #149
> (?:a)
> a
> YES
> a
>
> #150
> (?:a)
> aa
> YES
> a
>
> #151
> (?:\w)
> abc
> YES
> a
>
> #152
> (?:\w\s\w)+
> a b c
> YES
> a b
>
> #153
> (a\w)(?:,(a\w))+
> ab,ac,ad
> YES
> ab,ac,ad
> ab
> ad
>
> #154
> z(\w\s+(?:\w\s+\w)+)z
> za b bc cd dz
> YES
> za b bc cd dz
> a b bc cd d
>
> #155
> (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*
> http://www.test.com
> YES
> http://www.test.com
> http://
> http
> .com
>
> #156
> ((?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*
> ftp://www.test.com
> YES
> ftp://www.test.com
> ftp://
> .com
>
> #157
> (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*
> htTp://www.test.com
> YES
> htTp://www.test.com
> htTp://
> htTp
>
> #158
> (?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*
> FTP://www.test.com
> YES
> FTP://www.test.com
> FTP
> .com
>
> #159
> ^(?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*$
> http://.www.test.com
> NO
>
> #160
> ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$
> FtP://www.test.com
> YES
> FtP://www.test.com
>
> #161
> ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$
> FtTP://www.test.com
> NO
>
> #162
> ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$
> www.test.com
> YES
> www.test.com