Thanks for the contribution -- I always learn something worthwhile
from seeing really different approaches to a problem!
[EMAIL PROTECTED] wrote:
>
> Apologies in advance if I'm missing the mark (haven't been
> able to really read this or any other thread very deeply
> lately, but isn't the situation you are addressing one of sets:
>
I can see why it would appear that way (especially given the simple
examples to date) but there's much more to it. (I suppose we could
say "sets in context", I'd have to think hard about whether that
was adequate.)
Each regular expression does, in fact, describe a set of strings,
but does so by describing the syntax required of any string in
(or, as we'll cover in a moment, out of) the target set. Since
these sets can be arbitrarily large, it's often infeasible to
perform the test by explicit operations over the set of possible
valid strings.
The regular expression (RE) itself specifies whether the context
of the explicitly-matched characters is relevant, and whether the
match is a "succeed if matched" or "succeed if not matched" test.
The limitation of the code you presented is that it ignores some
of the context and partial-description issues. Let me give a
few slightly more interesting examples, and ask if you can see a
simple way to extend your code to handle them.
Examples:
(The slashes are simply the default RE delimiters -- I'm using
them to avoid bogging down in escape character issues)
/(zip|gz)$/
will match any string that ENDS in "zip" or "gz";
/^(reb|lis|sch)/
matches any string that BEGINS with one of the three-letter
combinations listed;
/^(larry|curly|moe)$/
will only match the three words EXACTLY as given (beginning to
end, with no prefixes or suffixes allowes);
/(ix|ux)/
will match any string that contains either of the two-letter
combinations ANYWHERE, regardless of context (as in "unix",
"unixes", "Nixon", etc.); and
/^(?!(Win|win|MS))/
will match the name of any state-of-the-art operating system
(by excluding "windows", "Win32", "Windows 2000", "MS-DOS",
etc. ;-)
-jn-