[REBOL] Regular Expressions Re:

ole_f Tue, 4 Jan 2000 12:28:49 -0800
Hi Keith, 4-Jan-2000 you wrote:

>1. Is there any support for regular expressions planned (or is Rebol only
>going to support 'parse for everything regular expressions might be used for
>that isn't already taken care of by words like 'find.)?

Parse does even more than you want, so there's no point in this.

>2. I've heard that 'parse actually supports a superset of regular
>expressions. Is this true?

That's right. Parse can be used for context-free languages, which is a
superset of regular languages.

This is necessary, for example, to match strings with balanced parentheses.
This is easy with Parse:

>> b: ["(" b ")" | ""]
== ["(" b ")" | ""]
>> parse "((()))" b
== true
>> parse "()))" b
== false

But it's simply not possible with regular expressions. (Yes, I've heard that
they've put a hack into Perl which "fixes" this, but it really is just a
hack.)

>3. If there are no plans to build regular expressions into the language,
>would anyone be interested if someone were to develop a regular expression
>capability?

What's the point? It's so dead easy converting regular expressions into the
BNF-like notation that Parse uses:

"abc" becomes ["abc"]
[abc] becomes ["a" | "b" | "c"], or you could make a bitset for this:
abc-set: charset "abc"
and then the Parse block becomes [abc-set]. The Charset function can, as far
as I know, cope with all the features you can normally use in a character
class definition, such as:
abc-set: charset [#"a" - #"c"]
and with the Complement function, you can create negated character classes as
well, here to create the charset corresponding to [^abc]:
complement charset "abc"

Let a, b be regular expressions and r, s be the corresponding Parse blocks:
a* becomes [any r]
a+ becomes [some r]
(a|b) becomes [r | s]
ab becomes [r s]

At least, that's the way it should be, I guess. Though, I'm not that familiar
with the way they write regular expressions in e.g. Python and Perl. I've
only used regular expressions in (f)lex and for theoretical usage.

>4. If there is interest, what are your thoughts on it? How would it work,
>look, etc?

>maybe there could be a 'regex word, or possibly have it as a refinement to
>'parse, as in parse/regex. (Is it even possible to add a refinement to a
>native word?)

>a skeleton of 'regex might look like this:

>regex: make function! [
>    string [string!] "The string to run the regex on"
>    pattern [string! block!] "The regular expression pattern"
>]
>[
>    ;do matching
>]

>'regex could return a block of values for patterns that have parentheses in
>them, so with

>string: "Hello, World. This is your captain speaking."
>regex string "(H.+),.+(c.+) "

>regex returns ["Hello" "captain"]

Shouldn't it also be possible to see if the regular expression does not match
the whole string, like in the above (where "speaking" is not matched)?

>Anyway, these are just some quick sketches. Does anyone think it would be
>worth it were someone to develop it? Would it be so slow that it would be
>useless? Does anyone have any other thoughts they want to share?

Please don't get me wrong, but I don't think it's worth it. Better try to
learn Parse instead: It's very easy to use, and Parse blocks are more easy to
read than spooky regular expressions. Besides the added power that you'll
soon find yourself using.

>Also, quick question about 'parse... does 'parse ever return a true value
>for anything other than simple splitting?

If you do more than simple splitting, it returns true or false, depending on
whether or not your Parse block has matched the input string. (See the
balanced parentheses example above.)

If you want results, you can copy from the input string from inside the parse
block. Have a look at the REBOL manual (on the web site) on how to do this
and a lot more.

Kind regards,
-- 
Ole Friis <[EMAIL PROTECTED]>

"Ignorance is bliss"
(Cypher, The Matrix)
[REBOL] Regular Expressions Re:

Reply via email to