*Regular Expressions and VBScript*
Regular Expressions provide a much more powerful and efficient way of
manipulating strings of text than the use of a variety of standard string
functions. They have a reputation of being cryptic and difficult to learn,
but are actually quite easy to learn and use.
*The RexExp Object *
The RegExp object has three properties and three methods. The properties
are:
- Pattern property - holds the regular expression pattern
- Global property - True or False (default False). If False, matching
stops at first match.
- IgnoreCase property - True or False (default True). If True, allows
case-insensitive matching
The methods are:
- Execute method - executes a match against the specified string.
Returns a Matches collection, which contains a Match object for each
match. The Match object can also contain a SubMatches collection
- Replace method - replaces the part of the string found in a match with
another string
- Test method - executes an attempted match and returns True or False
To set up a RegExp object:
Dim re
Set re = New RegExp
With re
.Pattern = "some_pattern"
.Global = True
.IgnoreCase = True
End With
A Pattern can be any string value. For example, if the pattern is "Hello
World", the RegExp object will match that in the target string. If
IgnoreCase is True, it will match any case, so "hellO wORld" would be
matched. If Global is set to True, it will contine to search the string for
all instances of "Hello World". If False, it will stop searching after the
first instance is found.
Execute method, returning Matches collection
Dim re, targetString, colMatch, objMatch
Set rs = New RegExp
With re
.Pattern = "a"
.Global = True
.IgnoreCase = True
End With
targetString = "The rain in Spain falls mainly in the plain"
Set colMatch = re.Execute(targetString)
For each objMatch in colMatch
Response.Write objMatch.Value & "<br />"
Next
The above will produce a list of 5 letter a's.
Test method, returning True or False
Dim re, targetString
Set rs = New RegExp
With re
.Pattern = "a"
.Global = False
.IgnoreCase = False
End With
targetString = "The rain in Spain falls mainly in the plain"
re.Test(targetString)
The above will return True as soon as it hits the first instance of "a"
Metacharacters
Metacharacters are special characters that can be combined with literal
characters (which is all that have been used so far) to extend the power of
Regular Expressions way beyond the simple examples already seen, and are
what set Regular Expressions apart from simple string functions.
Character Description *\ * Marks the next character as either a special
character or a literal. For example, "n" matches the character "n". "\n"
matches a newline character. The sequence "\\" matches "\" and "\(" matches
"(". *^ * Matches the beginning of input. *$ * Matches the end of input.
** * Matches the preceding character zero or more times. For example, "zo*"
matches either "z" or "zoo". *+ * Matches the preceding character one or
more times. For example, "zo+" matches "zoo" but not "z". *? * Matches the
preceding character zero or one time. For example, "a?ve?" matches the "ve"
in "never". . Matches any single character except a newline character. *(*
pattern*)* Matches *pattern* and remembers the match. The matched substring
can be retrieved from the resulting *Matches* collection, using Item *
[0]...[n]*. To match parentheses characters ( ), use "\(" or "\)". x*|*y
Matches
either *x* or *y*. For example, "z|wood" matches "z" or "wood". "(z|w)oo"
matches "zoo" or "wood". {*n*} *n* is a nonnegative integer. Matches
exactly *n* times. For example, "o{2}" does not match the "o" in "Bob," but
matches the first two o's in "foooood". {*n*,} *n* is a nonnegative
integer. Matches at least *n* times. For example, "o{2,}" does not match
the "o" in "Bob" and matches all the o's in "foooood." "o{1,}" is
equivalent to "o+". "o{0,}" is equivalent to "o*". *{**n**,**m**}* *m* and
*n* are nonnegative integers. Matches at least *n* and at most *m* times.
For example, "o{1,3}" matches the first three o's in "fooooood." "o{0,1}"
is equivalent to "o?". *[**xyz**]* A character set. Matches any one of the
enclosed characters. For example, "[abc]" matches the "a" in "plain". *[^**
xyz**]* A negative character set. Matches any character not enclosed. For
example, "[^abc]" matches the "p" in "plain". *[**a-z**]* A range of
characters. Matches any character in the specified range. For example,
"[a-z]" matches any lowercase alphabetic character in the range "a" through
"z". *[^**m-z**]* A negative range characters. Matches any character not
in the specified range. For example, "[m-z]" matches any character not in
the range "m" through "z". \b Matches a word boundary, that is, the
position between a word and a space. For example, "er\b" matches the "er"
in "never" but not the "er" in "verb". \B Matches a non-word boundary.
"ea*r\B" matches the "ear" in "never early". \d Matches a digit character.
Equivalent to [0-9]. \D Matches a non-digit character. Equivalent to
[^0-9]. \f Matches a form-feed character. \n Matches a newline character. \r
Matches a carriage return character. \s Matches any white space including
space, tab, form-feed, etc. Equivalent to "[ \f\n\r\t\v]". \S Matches any
nonwhite space character. Equivalent to "[^ \f\n\r\t\v]". \t Matches a tab
character. \v Matches a vertical tab character. \w Matches any word
character including underscore. Equivalent to "[A-Za-z0-9_]". \W Matches
any non-word character. Equivalent to "[^A-Za-z0-9_]". *\*num Matches *num*,
where *num* is a positive integer. A reference back to remembered matches.
For example, "(.)\1" matches two consecutive identical characters. *\**n*
Matches
*n*, where *n* is an octal escape value. Octal escape values must be 1, 2,
or 3 digits long. For example, "\11" and "\011" both match a tab character.
"\0011" is the equivalent of "\001" & "1". Octal escape values must not
exceed 256. If they do, only the first two digits comprise the expression.
Allows ASCII codes to be used in regular expressions. \x*n* Matches *n*,
where *n* is a hexadecimal escape value. Hexadecimal escape values must be
exactly two digits long. For example, "\x41" matches "A". "\x041" is
equivalent to "\x04" & "1". Allows ASCII codes to be used in regular
expressions.
Examples
\d+ will match any digit one or more times, and is the equivalent to [0-9]+
<[^>]*> will match any html tag, and looks for an opening "<", followed by
anything that isn't a closing block ">", followed finally by a closing
block ">". It uses a "negative character set" [^>]
Constructing a RegExp pattern
Form input validation is a key area in which regular expressions can be
used, and a common task is to validate the structure of an email address.
Initially, the task needs to be broken down into its constituent rules:
- Must have 1 or more letters or numbers
- Can have underscores, hyphens, dots, apostrophes
- Must have an "@" sign following this
- First part of domain name must follow the "@", and must contain at
least 3 letters or numbers
- May contain underscore, dots or hyphen
- Must be at least one dot, which must be followed by the TLD.
"[\w\-\'\.]+@{1}[\w\.?\-?]{3,}\.[\a-z]+" will do it, but can be improved
upon depending on how specific you want to be.
SubMatches collection
There will be instances where, once a match is found, you want to extract
parts of that match for later use. As an example, suppose you have an html
page which contains a list of links:
<a href="somepage.asp?id=12345">Company A</a><br />
<a href="somepage.asp?id=45678">Company B</a><br />
<a href="somepage.asp?id=66745">Company C</a><br />
<a href="somepage.asp?id=33471">Company D</a><br />
<a href="somepage.asp?id=90765">Company E</a><br />
...
The required parts are the Company name and the id in the querystring.
These need to be collected and inserted into a database, for example. The
html is fed in as the strSearchOn, and the pattern uses parenthesis to
search for each item - The id ([0-9]{5}), which is a 5 digit number, and
([\w\s]+) which collects a series of letters and spaces, and will stop
collecting them when the opening angle bracket is reached (</a>).
Set objRegExpr = New regexp
objRegExpr.Pattern = "somepage.asp\?id=([0-9]{5})" & chr(34) & ">([\w\s]+)"
objRegExpr.Global = True
objRegExpr.IgnoreCase = True
set colmatches = objRegExpr.Execute(strSearchOn)
For Each objMatch in colMatches
id = objMatch.SubMatches(0)
company = objMatch.SubMatches(1)
sql = "Insert Into table (idfield, company) Values (" & id & ",'" & company &
"')"
conn.execute(sql)
Next
--
You received this message because you are subscribed to the Google
"QTP - HP Quick Test Professional - Automated Software Testing"
group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/MercuryQTP?hl=en