[REBOL] Regular Expressions Re:(7)

Petr . Krenzelok Sun, 9 Jan 2000 05:24:39 -0800


[EMAIL PROTECTED] wrote:

> Hi Petr, 9-Jan-2000 you wrote:
>
> [...]
> >> >This will not work imho :-)
> >>
> >> You're right. As I said, it was completely untested ;-)
> >>
> >> I used "to" instead of "thru", and we need the /all refinement. But
> apparently
> >> that's still not enough, though I don't know why...
> >>
>
> >As I said - it doesn't matter if you used 'to or 'thru, because once you are
> back
> >from 'b's maddening :-), you are standing just behind the last occurance of
> "ing"
> >in the parsed string ...
>
> Erm, we should be standing just in front of it, otherwise there's now point?

OK, you are right, you standing in front of "ing" being found by backtracking
fromt the end of the script. This is EXACTLY the phase, when right part of rule is
executed for the first time with 'true as a result. So, "ing" is matched now, what
next?

We are STILL deep in recursion, but as rule was succesfull, it DOESN'T return one
level higher, but tries to find another rule on the same level. But there is not
any. "to end" would solve it. The result is, even if "ing" got matched and
applying of rule was a success, it doesn't mean whole parse is succesfull. It's
like expecting parse "ringing bells" [thru "ing"] is going to return 'true, but it
isn't as "to end" is missing ...

> And I wanted to see if we had a separator just behind the "ing", which means
> I should use 'thru.
>
> >> To make things even more spooky, look at this:
> >>
> >> sep: charset " ,.!?"
> >> b1: [skip b1 | "ing"]
> >> b2: [skip b2 | "ing" sep to end]
> >> a: [b1 | b2]
> >>
> >> The idea here is that b1 will match any string ending in "ing", and b2 will
> >> match any string which contains "ing" followed by a separator. Now, watch
> >> this:
> >>
> >> >> parse/all "ringing bells" b2
> >> == true
>
> >it's just the same as 'b1, it just contains the code to skip to the end of
> the
> >string ....
>
> Yes. As I said above:
>
> The idea here is that b1 will match any string ending in "ing",

no, it will match any string CONTAINING "ing". Look at what b1 does in itself.
Let's assume string not containing "ing"

->> str: "some text without i.n.g."

->> b1: [skip b1-mark: (print ["b1: " b1-mark index? b1-mark]) b1 | b2-mark:
(print ["b2: " b2-mark  index? b2-mark]) "ing"]
== [skip b1-mark: (print ["b1: " b1-mark index? b1-mark]) b1 | b2-mark: (print
["b2: " b2-mark index? b2-mark]) "ing"]

->> parse/all str b1
b1:  ome text without i.n.g. 2
b1:  me text without i.n.g. 3
b1:  e text without i.n.g. 4
b1:   text without i.n.g. 5
b1:  text without i.n.g. 6
b1:  ext without i.n.g. 7
b1:  xt without i.n.g. 8
b1:  t without i.n.g. 9
b1:   without i.n.g. 10
b1:  without i.n.g. 11
b1:  ithout i.n.g. 12
b1:  thout i.n.g. 13
b1:  hout i.n.g. 14
b1:  out i.n.g. 15
b1:  ut i.n.g. 16
b1:  t i.n.g. 17
b1:   i.n.g. 18
b1:  i.n.g. 19
b1:  .n.g. 20
b1:  n.g. 21
b1:  .g. 22
b1:  g. 23
b1:  . 24
b1:   25
b2:   25
b2:  . 24
b2:  g. 23
b2:  .g. 22
b2:  n.g. 21
b2:  .n.g. 20
b2:  i.n.g. 19
b2:   i.n.g. 18
b2:  t i.n.g. 17
b2:  ut i.n.g. 16
b2:  out i.n.g. 15
b2:  hout i.n.g. 14
b2:  thout i.n.g. 13
b2:  ithout i.n.g. 12
b2:  without i.n.g. 11
b2:   without i.n.g. 10
b2:  t without i.n.g. 9
b2:  xt without i.n.g. 8
b2:  ext without i.n.g. 7
b2:  text without i.n.g. 6
b2:   text without i.n.g. 5
b2:  e text without i.n.g. 4
b2:  me text without i.n.g. 3
b2:  ome text without i.n.g. 2
b2:  some text without i.n.g. 1
== false
->>

huh :-) and we are back from all recursion calls. Being it one way or other, I
would suggest forget your technique, as it's very inefficient ...

and now, let's change str to contain "ing, just to see, when we get out from the
rule:

->> str: "some text withing i.n.g."
== "some text withing i.n.g."

->> parse/all str b1
b1:  ome text withing i.n.g. 2
b1:  me text withing i.n.g. 3
b1:  e text withing i.n.g. 4
b1:   text withing i.n.g. 5
b1:  text withing i.n.g. 6
b1:  ext withing i.n.g. 7
b1:  xt withing i.n.g. 8
b1:  t withing i.n.g. 9
b1:   withing i.n.g. 10
b1:  withing i.n.g. 11
b1:  ithing i.n.g. 12
b1:  thing i.n.g. 13
b1:  hing i.n.g. 14
b1:  ing i.n.g. 15
b1:  ng i.n.g. 16
b1:  g i.n.g. 17
b1:   i.n.g. 18
b1:  i.n.g. 19
b1:  .n.g. 20
b1:  n.g. 21
b1:  .g. 22
b1:  g. 23
b1:  . 24
b1:   25   ; skip fails for the first time here :-)
b2:   25   ; second part of rule applied for the first time :-)
b2:  . 24
b2:  g. 23
b2:  .g. 22
b2:  n.g. 21
b2:  .n.g. 20
b2:  i.n.g. 19
b2:   i.n.g. 18
b2:  g i.n.g. 17
b2:  ng i.n.g. 16
b2:  ing i.n.g. 15
== false
->>

oops - do you see? "ing" was applied, but we are NOT returning to upper levels of
recursion. The result is  - 'false, but why? As the rule doesn't contain "to end"


>and b2 will

> match any string which contains "ing" followed by a separator.

>
> Notice the use of "ending in" and "contains".
>
> [...]
> >6) so b1 is true for the first time and we are standing just behind the "ing"
> >string, so at beginning of "bells", or " bells", once parse/all is used ....
>
> You _do_ mean that we're in front of the "ing" string, right?

yes, once we are just in front of "ing", we can finally apply "ing" from the right
part of the rule ...

> We cannot see
> what is just to the left of our position, so talking about what happens when
> we're at the right of "ing" makes no sense.
>
> Besides, b1 will never ever return true for "ringing bells", as this string
> does not end in "ing".
>

:-) in real, it WILL. the rule [X | Y] returns true, if one of its elements
returns true (equal to rebol's ANY function). And as "ing" got matched, it WILL
return true ...

Why should the string end in "ing"? Look at console outputs please, to understand,
what sequence of "skip b1" does ...

> >and now back to parse/all a
>
> >a is subrule!
>
> Huh???
>
> >b2 part of this subrule is NEVER applied. Know why? Just because at some
> point, we
> >returned true from b1, so there is no need to perform OR option. The 'false
> it
> >there, because all we need to do is to skip to the end of the string ...
>
> But if we returned true from b1, the result of the whole parsing should be
> true too.

No, parse "blabla something here" [thru "something"] doesn't mean parse will
return true, although rule was succesfully applied!

> But of course b1 will not return true, as b1 will only match
> strings _ending_ with "ing", and "ringing bells" is not ending with "ing".

see above, wrong assumption here.

> Either I do not understand you, or you have missed the point that the rule to
> the left of the "|" is returning _false_, not true.

yes, you are right :-) It's returning false, but skipping one char back each time.
So if false is on the lef side, let's try to apply right side ...

>
> >> >change 'a to [b to end] and once succesfully back from 'b, it will
> continue
> >> "to
> >> >end" and return "true" ...
> >>
> >> But that's not what it should do. Then the rule would match all strings
> with
> >> "ing" in it.
>
> >Yes, backtracked from the tail of the string ...
>
> I repeat: That's not what it should do.
>
> >> We only want to match strings that either end in "ing" or have a
> >> word inside it (words are separated by spaces, commas, etc.) that end with
> >> "ing".
> >>
>
> >I don't understand it as for above parse/all "ringing bells" [thru "ing"]
> should
> >be sufficient ...
>
> Nope, see below:
>
> >well if you want to return false for str:  "ringdong bells", that's another
> story
> >....
>
> And that's exactly what I want to do!
>
> >parse/all str [thru "ing " to end]
>
> But a word can end in ",", ".", "!", "?", etc. I wanted to cover all these
> instances with my "sep" variable, which is a bitset. You cannot do that in
> the above way, since 'thru doesn't accept the block ["ing" sep] instead of
> your "ing " string.
>

well, anyway, I would suggest not to use recursive aproach here. It will fail with
longer strings, it will skip by one character each time, etc.

sep: charset " ,.!?"

if all [(skip tail str -3) == "ing"                         ; this could be
generalised (skip tail str negate length? substr-given)
        parse/all str [any [thru "ing" sep] to end]
      ][print "Yeah!"]

hmm? It's not all solved using 'parse, but it seems to work ...

hope-this-helps

See ya,

-pekr-

> Kind regards,
> --
> Ole Friis <[EMAIL PROTECTED]>
>
> "Ignorance is bliss"
> (Cypher, The Matrix)
[REBOL] Regular Expressions Re:(7)

Reply via email to