jon 01/03/09 14:17:16
Modified: src/java/org/apache/regexp RE.java
Log:
fixed javadoc examples that would not compile
for the find: thanks to
Vladimir Tsichevski <[EMAIL PROTECTED]>
for the fix: thanks to:
Iain Lowe <[EMAIL PROTECTED]>
fixed some of the 80 col wrapping.
Revision Changes Path
1.9 +108 -93 jakarta-regexp/src/java/org/apache/regexp/RE.java
Index: RE.java
===================================================================
RCS file: /home/cvs/jakarta-regexp/src/java/org/apache/regexp/RE.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -u -r1.8 -r1.9
--- RE.java 2001/02/20 01:18:45 1.8
+++ RE.java 2001/03/09 22:17:13 1.9
@@ -60,20 +60,21 @@
import java.util.Vector;
/**
- * RE is an efficient, lightweight regular expression evaluator/matcher class.
- * Regular expressions are pattern descriptions which enable sophisticated matching
of
- * strings. In addition to being able to match a string against a pattern, you
- * can also extract parts of the match. This is especially useful in text parsing!
- * Details on the syntax of regular expression patterns are given below.
+ * RE is an efficient, lightweight regular expression evaluator/matcher
+ * class. Regular expressions are pattern descriptions which enable
+ * sophisticated matching of strings. In addition to being able to
+ * match a string against a pattern, you can also extract parts of the
+ * match. This is especially useful in text parsing! Details on the
+ * syntax of regular expression patterns are given below.
*
* <p>
*
- * To compile a regular expression (RE), you can simply construct an RE matcher
- * object from the string specification of the pattern, like this:
+ * To compile a regular expression (RE), you can simply construct an RE
+ * matcher object from the string specification of the pattern, like this:
*
* <pre>
*
- * RE r = new RE("a*b");
+ * RE r = new RE("a*b");
*
* </pre>
*
@@ -84,7 +85,7 @@
*
* <pre>
*
- * boolean matched = r.match("aaaab");
+ * boolean matched = r.match("aaaab");
*
* </pre>
*
@@ -92,43 +93,43 @@
* pattern "a*b" matches the string "aaaab".
*
* <p>
- * If you were interested in the <i>number</i> of a's which matched the first
- * part of our example expression, you could change the expression to
+ * If you were interested in the <i>number</i> of a's which matched the
+ * first part of our example expression, you could change the expression to
* "(a*)b". Then when you compiled the expression and matched it against
* something like "xaaaab", you would get results like this:
*
* <pre>
*
- * RE r = new RE("(a*)b"); // Compile expression
- * boolean matched = r.match("xaaaab"); // Match against "xaaaab"
+ * RE r = new RE("(a*)b"); // Compile expression
+ * boolean matched = r.match("xaaaab"); // Match against "xaaaab"
*
* <br>
*
- * String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab'
- * String insideParens = r.getParen(1); // insideParens will be 'aaaa'
+ * String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab'
+ * String insideParens = r.getParen(1); // insideParens will be 'aaaa'
*
* <br>
*
- * int startWholeExpr = getParenStart(0); // startWholeExpr will be index 1
- * int endWholeExpr = getParenEnd(0); // endWholeExpr will be index 6
- * int lenWholeExpr = getParenLength(0); // lenWholeExpr will be 5
+ * int startWholeExpr = r.getParenStart(0); // startWholeExpr will be index 1
+ * int endWholeExpr = r.getParenEnd(0); // endWholeExpr will be index 6
+ * int lenWholeExpr = r.getParenLength(0); // lenWholeExpr will be 5
*
* <br>
*
- * int startInside = getParenStart(1); // startInside will be index 1
- * int endInside = getParenEnd(1); // endInside will be index 5
- * int lenInside = getParenLength(1); // lenInside will be 4
+ * int startInside = r.getParenStart(1); // startInside will be index 1
+ * int endInside = r.getParenEnd(1); // endInside will be index 5
+ * int lenInside = r.getParenLength(1); // lenInside will be 4
*
* </pre>
*
- * You can also refer to the contents of a parenthesized expression within
- * a regular expression itself. This is called a 'backreference'. The first
- * backreference in a regular expression is denoted by \1, the second by \2
- * and so on. So the expression:
+ * You can also refer to the contents of a parenthesized expression
+ * within a regular expression itself. This is called a
+ * 'backreference'. The first backreference in a regular expression is
+ * denoted by \1, the second by \2 and so on. So the expression:
*
* <pre>
*
- * ([0-9]+)=\1
+ * ([0-9]+)=\1
*
* </pre>
*
@@ -146,12 +147,12 @@
*
* <br>
*
- * <i>unicodeChar</i> Matches any identical unicode character
+ * <i>unicodeChar</i> Matches any identical unicode character
* \ Used to quote a meta-character (like '*')
* \\ Matches a single '\' character
* \0nnn Matches a given octal character
* \xhh Matches a given 8-bit hexadecimal character
- * \\uhhhh Matches a given 16-bit hexadecimal character
+ * \\uhhhh Matches a given 16-bit hexadecimal character
* \t Matches an ASCII tab character
* \n Matches an ASCII newline character
* \r Matches an ASCII return character
@@ -178,17 +179,23 @@
* [:blank:] Space and tab characters.
* [:cntrl:] Control characters.
* [:digit:] Numeric characters.
- * [:graph:] Characters that are printable and are also visible. (A
space is printable, but not visible, while an `a' is both.)
+ * [:graph:] Characters that are printable and are also visible.
+ * (A space is printable, but not visible, while an
+ * `a' is both.)
* [:lower:] Lower-case alphabetic characters.
- * [:print:] Printable characters (characters that are not control
characters.)
- * [:punct:] Punctuation characters (characters that are not letter,
digits, control characters, or space characters).
- * [:space:] Space characters (such as space, tab, and formfeed, to
name a few).
+ * [:print:] Printable characters (characters that are not
+ * control characters.)
+ * [:punct:] Punctuation characters (characters that are not letter,
+ * digits, control characters, or space characters).
+ * [:space:] Space characters (such as space, tab, and formfeed,
+ * to name a few).
* [:upper:] Upper-case alphabetic characters.
* [:xdigit:] Characters that are hexadecimal digits.
*
* <br>
*
- * <b><font face=times roman>Non-standard POSIX-style Character Classes</font></b>
+ * <b><font face=times roman>Non-standard POSIX-style Character
+ * Classes</font></b>
*
* <br>
*
@@ -201,13 +208,13 @@
*
* <br>
*
- * . Matches any character other than newline
- * \w Matches a "word" character (alphanumeric plus "_")
- * \W Matches a non-word character
- * \s Matches a whitespace character
- * \S Matches a non-whitespace character
- * \d Matches a digit character
- * \D Matches a non-digit character
+ * . Matches any character other than newline
+ * \w Matches a "word" character (alphanumeric plus "_")
+ * \W Matches a non-word character
+ * \s Matches a whitespace character
+ * \S Matches a non-whitespace character
+ * \d Matches a digit character
+ * \D Matches a non-digit character
*
* <br>
*
@@ -215,10 +222,10 @@
*
* <br>
*
- * ^ Matches only at the beginning of a line
- * $ Matches only at the end of a line
- * \b Matches only at a word boundary
- * \B Matches only at a non-word boundary
+ * ^ Matches only at the beginning of a line
+ * $ Matches only at the end of a line
+ * \b Matches only at a word boundary
+ * \B Matches only at a non-word boundary
*
* <br>
*
@@ -226,12 +233,12 @@
*
* <br>
*
- * A* Matches A 0 or more times (greedy)
- * A+ Matches A 1 or more times (greedy)
- * A? Matches A 1 or 0 times (greedy)
- * A{n} Matches A exactly n times (greedy)
- * A{n,} Matches A at least n times (greedy)
- * A{n,m} Matches A at least n but not more than m times (greedy)
+ * A* Matches A 0 or more times (greedy)
+ * A+ Matches A 1 or more times (greedy)
+ * A? Matches A 1 or 0 times (greedy)
+ * A{n} Matches A exactly n times (greedy)
+ * A{n,} Matches A at least n times (greedy)
+ * A{n,m} Matches A at least n but not more than m times (greedy)
*
* <br>
*
@@ -239,9 +246,9 @@
*
* <br>
*
- * A*? Matches A 0 or more times (reluctant)
- * A+? Matches A 1 or more times (reluctant)
- * A?? Matches A 0 or 1 times (reluctant)
+ * A*? Matches A 0 or more times (reluctant)
+ * A+? Matches A 1 or more times (reluctant)
+ * A?? Matches A 0 or 1 times (reluctant)
*
* <br>
*
@@ -249,10 +256,11 @@
*
* <br>
*
- * AB Matches A followed by B
- * A|B Matches either A or B
- * (A) Used for subexpression grouping
- * (?:A) Used for subexpression clustering (just like grouping
but no backrefs)
+ * AB Matches A followed by B
+ * A|B Matches either A or B
+ * (A) Used for subexpression grouping
+ * (?:A) Used for subexpression clustering (just like grouping but
+ * no backrefs)
*
* <br>
*
@@ -260,15 +268,15 @@
*
* <br>
*
- * \1 Backreference to 1st parenthesized subexpression
- * \2 Backreference to 2nd parenthesized subexpression
- * \3 Backreference to 3rd parenthesized subexpression
- * \4 Backreference to 4th parenthesized subexpression
- * \5 Backreference to 5th parenthesized subexpression
- * \6 Backreference to 6th parenthesized subexpression
- * \7 Backreference to 7th parenthesized subexpression
- * \8 Backreference to 8th parenthesized subexpression
- * \9 Backreference to 9th parenthesized subexpression
+ * \1 Backreference to 1st parenthesized subexpression
+ * \2 Backreference to 2nd parenthesized subexpression
+ * \3 Backreference to 3rd parenthesized subexpression
+ * \4 Backreference to 4th parenthesized subexpression
+ * \5 Backreference to 5th parenthesized subexpression
+ * \6 Backreference to 6th parenthesized subexpression
+ * \7 Backreference to 7th parenthesized subexpression
+ * \8 Backreference to 8th parenthesized subexpression
+ * \9 Backreference to 9th parenthesized subexpression
*
* <br>
*
@@ -276,20 +284,21 @@
*
* <p>
*
- * All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they
- * match as many elements of the string as possible without causing the overall
- * match to fail. If you want a closure to be reluctant (non-greedy), you can
- * simply follow it with a '?'. A reluctant closure will match as few elements
- * of the string as possible when finding matches. {m,n} closures don't currently
+ * All closure operators (+, *, ?, {m,n}) are greedy by default, meaning
+ * that they match as many elements of the string as possible without
+ * causing the overall match to fail. If you want a closure to be
+ * reluctant (non-greedy), you can simply follow it with a '?'. A
+ * reluctant closure will match as few elements of the string as
+ * possible when finding matches. {m,n} closures don't currently
* support reluctancy.
*
* <p>
*
- * RE runs programs compiled by the RECompiler class. But the RE matcher class
- * does not include the actual regular expression compiler for reasons of
- * efficiency. In fact, if you want to pre-compile one or more regular expressions,
- * the 'recompile' class can be invoked from the command line to produce compiled
- * output like this:
+ * RE runs programs compiled by the RECompiler class. But the RE
+ * matcher class does not include the actual regular expression compiler
+ * for reasons of efficiency. In fact, if you want to pre-compile one
+ * or more regular expressions, the 'recompile' class can be invoked
+ * from the command line to produce compiled output like this:
*
* <pre>
*
@@ -309,14 +318,16 @@
*
* </pre>
*
- * You can then construct a regular expression matcher (RE) object from the
pre-compiled
- * expression re1 and thus avoid the overhead of compiling the expression at
runtime.
- * If you require more dynamic regular expressions, you can construct a single
RECompiler
- * object and re-use it to compile each expression. Similarly, you can change the
- * program run by a given matcher object at any time. However, RE and RECompiler
are
- * not threadsafe (for efficiency reasons, and because requiring thread safety in
this
- * class is deemed to be a rare requirement), so you will need to construct a
separate
- * compiler or matcher object for each thread (unless you do thread synchronization
+ * You can then construct a regular expression matcher (RE) object from
+ * the pre-compiled expression re1 and thus avoid the overhead of
+ * compiling the expression at runtime. If you require more dynamic
+ * regular expressions, you can construct a single RECompiler object and
+ * re-use it to compile each expression. * Similarly, you can change the
+ * program run by a given matcher object at any time. * However, RE and
+ * RECompiler are not threadsafe (for efficiency reasons, and because
+ * requiring thread safety in this class is deemed to be a rare
+ * requirement), so you will need to construct a separate compiler or
+ * matcher object for each thread (unless you do thread synchronization
* yourself).
*
* </pre>
@@ -326,20 +337,24 @@
* <i>ISSUES:</i>
*
* <ul>
- * <li>com.weusours.util.re is not currently compatible with all standard POSIX
regcomp flags
- * <li>com.weusours.util.re does not support POSIX equivalence classes ([=foo=]
syntax) (I18N/locale issue)
- * <li>com.weusours.util.re does not support nested POSIX character classes
(definitely should, but not completely trivial)
- * <li>com.weusours.util.re Does not support POSIX character collation concepts
([.foo.] syntax) (I18N/locale issue)
- * <li>Should there be different matching styles (simple, POSIX, Perl etc?)
- * <li>Should RE support character iterators (for backwards RE matching!)?
- * <li>Should RE support reluctant {m,n} closures (does anyone care)?
+ * <li>com.weusours.util.re is not currently compatible with all
+ * standard POSIX regcomp flags</li>
+ * <li>com.weusours.util.re does not support POSIX equivalence classes
+ * ([=foo=] syntax) (I18N/locale issue)</li>
+ * <li>com.weusours.util.re does not support nested POSIX character
+ * classes (definitely should, but not completely trivial)</li>
+ * <li>com.weusours.util.re Does not support POSIX character collation
+ * concepts ([.foo.] syntax) (I18N/locale issue)</li>
+ * <li>Should there be different matching styles (simple, POSIX, Perl etc?)</li>
+ * <li>Should RE support character iterators (for backwards RE matching!)?</li>
+ * <li>Should RE support reluctant {m,n} closures (does anyone care)?</li>
* <li>Not *all* possibilities are considered for greediness when backreferences
* are involved (as POSIX suggests should be the case). The POSIX RE
* "(ac*)c*d[ac]*\1", when matched against "acdacaa" should yield a match
* of acdacaa where \1 is "a". This is not the case in this RE package,
* and actually Perl doesn't go to this extent either! Until someone
* actually complains about this, I'm not sure it's worth "fixing".
- * If it ever is fixed, test #137 in RETest.txt should be updated.
+ * If it ever is fixed, test #137 in RETest.txt should be updated.</li>
* </ul>
*
* </font>
@@ -348,7 +363,7 @@
* @see RECompiler
*
* @author <a href="mailto:[EMAIL PROTECTED]">Jonathan Locke</a>
- * @version $Id: RE.java,v 1.8 2001/02/20 01:18:45 jon Exp $
+ * @version $Id: RE.java,v 1.9 2001/03/09 22:17:13 jon Exp $
*/
public class RE
{