[REBOL] Regular Expressions Re:(5)

Petr . Krenzelok Sat, 8 Jan 2000 17:04:24 -0800


[EMAIL PROTECTED] wrote:

> Hi Petr, 8-Jan-2000 you wrote:
>
> [...]
> >> Ah, then I just got you wrong. The easy way to do the above is first to
> split
> >> the text up with
> >>
> >> parse str none
> >>
> >> and then match the individual words. However, this should work too:
> >>
> >> sep: charset " ,.!?" ; and whatever else you want to split up words
> >> b: [skip b | "ing"]
> >> a: [b | to "ing" [sep to end | a]]
> >> parse str a
> >>
> >> though it's completely untested, and I agree that it's ugly :-)
>
> >This will not work imho :-)
>
> You're right. As I said, it was completely untested ;-)
>
> I used "to" instead of "thru", and we need the /all refinement. But apparently
> that's still not enough, though I don't know why...
>

As I said - it doesn't matter if you used 'to or 'thru, because once you are back
from 'b's maddening :-), you are standing just behind the last occurance of "ing"
in the parsed string ...

>
> To make things even more spooky, look at this:
>
> sep: charset " ,.!?"
> b1: [skip b1 | "ing"]
> b2: [skip b2 | "ing" sep to end]
> a: [b1 | b2]
>
> The idea here is that b1 will match any string ending in "ing", and b2 will
> match any string which contains "ing" followed by a separator. Now, watch
> this:
>
> >> parse/all "ringing bells" b2
> == true

it's just the same as 'b1, it just contains the code to skip to the end of the
string ....

> >> parse/all "ringing bells" a
> == false
>
> Since the parse with b2 returned true, the parse with 'a should indeed be
> true, too, since 'a is true if either b1 or b2 gives true. Now, if instead I
> define 'a as [b2 | b1], we get true with 'a instead.
>
> I just cannot see why this should be so?
>

look, once again (look also at console outputs in previous emails):

b1: [skip b1 | "ing"]

1) b1 calls itself recursivelly after skip is performed .....
2) it's clear when skip fails - it happens once we reach end of the string, then
parse tries to use second option: | "ing"
3) it fails too, as parse can't mach "ing" at the end of string

end of saga ... we are standing at the end of the string, parse rule failed, so
let's return false one level higher ....

4) false is returned one level higher [skip b1 |  .... can you see it? The left
part of above level is false ....
5) time to try apply | "ing"

Not matched? Repeat it .... simply said, - we are coming back from the recursion,
until we've not reached "ing bells" (look at logs once again)

6) so b1 is true for the first time and we are standing just behind the "ing"
string, so at beginning of "bells", or " bells", once parse/all is used ....

and now back to parse/all a

a is subrule!

b2 part of this subrule is NEVER applied. Know why? Just because at some point, we
returned true from b1, so there is no need to perform OR option. The 'false it
there, because all we need to do is to skip to the end of the string ...

> >as for "ringing bells"
>
> >Again, you are going to reach end of the string, then going back recursively
> >until first "ing" (applied from end of the string) is not matched. :
>
> >>> b: [skip markb: (print ["markb: " markb index? markb]) b | back-b: (print
> >["back-b: " back-b in
> >dex? back-b]) "ing"]
> >== [skip markb: (print ["markb: " markb index? markb]) b | back-b: (print
> >["back-b: " back-b index
> >? back-b]) "ing"]
> >>> parse str a
> [...]
> >back-b:  g bells 7
> >back-b:  ng bells 6
> >back-b:  ing bells 5
> >== false
>
> >Your code will fail with something like "ringing sounding bell" ... it will
> match
> >sounding, so generally said - the last occurance of "ing" contained in the
> string
> >...
>
> It should then try to match the part to the right of the "|" in the top rule
> (rule a), which should match. But that doesn't seem to happen.

No, why? | means OR, not continuation of the rule. There is no need for OR, as
your first part of rule got 'true. Parser then tries to continue after right ] of
rule. Remember [blabla | bleble] is just subrule, once blabla is true, it then
skips behind the right ] ...

> >And because of that, second part of 'a - {to "ing" [sep to end | a]}will be
> NEVER
> >applied, as your pointer is just behind last occurance of "ing" contained in
> your
> >string ... that's why you got false result.
>
> So the second part of 'a will pick up where the first part left? If that's the
> case, then Parse _really_ has a bug, since the second part of 'a should pick
> up at the very same spot as where the first part started, which is the start
> of the entire string.
>

see above ...

> >change 'a to [b to end] and once succesfully back from 'b, it will continue
> "to
> >end" and return "true" ...
>
> But that's not what it should do. Then the rule would match all strings with
> "ing" in it.

Yes, backtracked from the tail of the string ...

> We only want to match strings that either end in "ing" or have a
> word inside it (words are separated by spaces, commas, etc.) that end with
> "ing".
>

I don't understand it as for above parse/all "ringing bells" [thru "ing"] should
be sufficient ...

well if you want to return false for str:  "ringdong bells", that's another story
....

parse/all str [thru "ing " to end]

and if it doesn't cover "ringdong bellsing", then just add space to the parsed
string before parse is applied ... what a hack :-)

Have a nice day,

-pekr-

>
> Kind regards,
> --
> Ole Friis <[EMAIL PROTECTED]>
>
> "Ignorance is bliss"
> (Cypher, The Matrix)
[REBOL] Regular Expressions Re:(5)

Reply via email to