[REBOL] Regular Expressions Re:(2)

ole_f Fri, 7 Jan 2000 13:30:17 -0800
Hi Eric, 7-Jan-2000 you wrote:

>Ole, you suggested this to find a word ending in "ing":

>>> a: [skip a | "ing"]
>== [skip a | "ing"]
>>> parse "ringin" a
>== false
>>> parse "ringing" a
>== true

>That's fine, except it won't detect a word ending in "ing" in the
>middle of the string:

>>> parse "ringing bells" a
>== false

Ah, then I just got you wrong. The easy way to do the above is first to split
the text up with

parse str none

and then match the individual words. However, this should work too:

sep: charset " ,.!?" ; and whatever else you want to split up words
b: [skip b | "ing"]
a: [b | to "ing" [sep to end | a]]
parse str a

though it's completely untested, and I agree that it's ugly :-)

>Actually, I think your suggestion is equivalent to the rule:

>    b: [to "ing"]

No, as this rule wouldn't return true on the string "ringing".

>Also, recursion is not so good for very long strings:

>>> str: "ringing"
>== "ringing"
>>> parse str a
>== true                             ; OK
>>> insert str "- - - - -"
>== "ringing"
>>> parse str a
>== true                             ; still OK
>>> insert/dup str "- - - - -" 10
>== "- - - - -ringing"
>>> parse str a
>== true                             ; still OK
>>> insert/dup str "- - - - -" 100
>== {- - - - -- - - - -- - - - ...
>>> parse str a
>== false                            ; stops working

>I expected a stack overflow error, but it just stopped working. Bug?

Strange. Don't know what happened, but you're right, recursion is really a
problem with REBOL as it offers only limited depth.

[...]
>>> parse "This is the straw that broke the camel's back" rule
>straw that broke the camel's back
>== true
>>> parse "I need some straw, and a camel" rule
>== false

>This is also very complex with PARSE, but trivial with regular
>expressions.

True. If Parse supported backtracking, it would just be a case of

parse str [any skip "straw" any non-punct "camel" to end]

>Ole wrote:

>>Converting "(string)*" and "(string)+", for example, into this:
>>
>>some-string: ["string" some-string | "string"]
>>any-string: ["string" any-string | none]
>>
>>should work, shouldn't it? But yes, I must admit, too, that 'any and 'some
>>ought to do some backtracking - in fact, I believed they did, so please
>>ignore the mail I sent a couple of days ago on how to easily build Parse
>>blocks from regular expressions ;-)
>>
>>Rebol Tech., isn't the lack of backtracking in 'any and 'some a bug?

>I'm not sure what the advantage of your suggestion would be over
>converting "(string)*" to [any "string"] and "(string)+" to [some

I actually had an idea with it, but I guess I need to think a little more
about it, as it doesn't seem quite so clear to me anymore ;-)

[...]
>The rule QUOTED-STR needs to be adjusted of course, depending on what
>you want to do, and there are always problems with how to deal with
>illegal strings (should you allow newlines in the middle or not etc).
>Still, parse rules make this kind of work easy, and backtracking can
>only be a complicating factor.

You're right on this one, i.e. that without backtracking, it might be easier
to understand exactly what's happening.

>>BTW, by building a DFA for a regular expression, you can get rid of
>>the backtracking. However, I don't know how fast a program can build
>>a DFA, it might not be that fast... but if you try to match one
>>regular expression to a lot of strings, it might be worth a go. Keep
>>in mind, though, that then you'll only be able to get a "matched" or
>>"didn't match" answer from your expression matching, it will probably
>>be very difficult to implement copying of the text inside
>>subexpressions etc.

>I've never read anything on how DFA's actually work, but it sounds real
>complicated. Search-text.r implements an NFA, of course, but even then I
>haven't figured out to copy text matching portions of the expression. If
>anyone would give me some hints, I'd be grateful.

Unfortunately I haven't ever tried to actually write a program which turns an
NFA into a DFA, but there are relatively simple algorithms to do so. I might
try to give it a go when I get the time, but unfortunately that's not now
(due to exams).

If you have an NFA with states s1, s2, ..., sn, you can build a DFA in which
the states are all possible subsets of {s1, s2, ..., sn}. The intuition is
this: When you feed some input stream to your NFA, it can be in a number of
states at once. Just select the state in your DFA that corresponds to the set
of states your NFA is in.

To decide which transitions you need between the states in your DFA given an
input symbol: For each element s in the set corresponding to the DFA state,
see which states you can go into from the NFA in state s. Take the union of
all the states you get, and you get another state in your DFA, and you've got
a transition between the two DFA states.

The starting state in your DFA is the state containing exactly the states you
can reach in your NFA given the empty string.

I hope the above is of any use, and not too messy ;-)

Kind regards,
--
Ole Friis <[EMAIL PROTECTED]>

"Ignorance is bliss"
(Cypher, The Matrix)
[REBOL] Regular Expressions Re:(2)

Reply via email to