[REBOL] Regular Expressions Re:

KGD03011 Tue, 4 Jan 2000 10:06:31 -0800

(Message previously posted with wrong subject. Sorry! - I'll just edit it a
bit before reposting.)

Hi Keith,

>Hi, just wondering about regular expressions in Rebol.
>
>1. Is there any support for regular expressions planned (or is Rebol only
>going to support 'parse for everything regular expressions might be used for
>that isn't already taken care of by words like 'find.)?

Funny you should ask. I just posted search-text.r to rebol.org, which is an
attempt to simulate regular expressions in REBOL. Please have a look at it
and tell me what you think.

http://www.rebol.org/utility/search-text.r

I use regular expressions in block form, and have tried to make them as
similar as possible to parse rules. Don't have any text capture yet, except
for the whole match. That's definitely needed, but I haven't figured out how
to do it yet. You can capture text in parse rules using 'copy, but this
capture occurs regardless of whether the rule succeeded overall, or whether
the portion of rule capturing the text contributed to the overall match or
not.

I think it would be very very difficult to translate regular expressions that
had complicated combinations of the * and + operators, but a very useful
trick is:

p: copy []
rule: [something like a regex]
parse string [some [p: rule (append pp p) | skip]]

This use of 'parse will always return true (since 'skip will take you to the
end of the string even if 'rule never matches), but 'pp will contain a list
of pointers to all the places that matched rule. Unfortunately, if you want
to have a rule like:

non-space: complement make bitset! " ^/^-"
["t" some non-space "t"]

you'll never match because 'some 'non-space will take you right through the
"t". You need to do something like

["t" some ["t" (append pp p) | non-space]]

Even some rather straightforward regular expressions would require extremely
complex parse rules.

Regexes have automatic backtracking - a regex engine will try any possible
combination of the elements you specify to make a match. Parse rules don't do
that - parse'll just plow through your rule unless you explicitly say, "and
if that didn't work back up to this point and try this." Parse rules are
great when you've got structured data meant to be read by a computer, since
regexes can find devilish ways to match things that shouldn't match.  Regexes
are better and much more concise when you're just looking for combinations of
words in natural languages.

Sorry for all this raving, and twice! Anyway, I'd be happy to exchange more
ideas on what should be done!

See you around,
Eric
[REBOL] Regular Expressions Re:

Reply via email to