Hi Petr,

Here's a technique Gabriele taught me:

>> str: "Something to say, some text to parse, something else"
== {Something to say, some text to parse, something else}
>> rule: ["s" "om" skip " " "t" thru "t" " " "to" " " "p" skip skip "se" to end]
== ["s" "om" skip " " "t" thru "t" " " "to" " " "p" skip skip "se" to end]
>> parse/all str [here: some [rule here: | skip] :here]
== true

This allows you to try to match RULE from every position in the string,
because even when RULE doesn't match SKIP will. This has the danger that
you'll return TRUE for all strings, because SKIP will just carry you through
to the end, but by setting HERE to the beginning of the string, then
resetting it once you've matched RULE, and finally setting the current
position to HERE at the end of the parse, you can assure that PARSE won't
return TRUE unless you've actually matched RULE.

The only trouble here is that "t" thru "t" will behave just like t*t - you
could have any amount of data in between. There's no way to keep 'thru from
skipping over non-space characters. Further, if you have an extra "t" in the
middle of the word you're sunk. For example,

>> str2: "Something to say, some testament to parse, something else"
== {Something to say, some testament to parse, something else}
>> parse/all str2 [here: some [rule here: | skip] :here]
== false

To restrict "t" and "t" to the same word, and allow extra "t"s in between,
I think you have to do something like this:

>> letter: make bitset! [#"a" - #"z" #"A" - #"Z"]
== make bitset! #{
0000000000000000FEFFFF07FEFFFF0700000000000000000000000000000000
}
>> r2: ["t" " " "to" " " "p" skip skip "se" to end here:]
== ["t" " " "to" " " "p" skip skip "se" to end here:]
>> r1: ["s" "om" skip " " "t" some [r2 | letter] ]
== ["s" "om" skip " " "t" some [r2 | letter]]
>> parse/all str [here: some [r1 | skip] :here]
== true

But I think this technique is too complex to be really practical.

I've been lobbying for more regular-expression-like parsing, to overcome the
problem you mention of parsing rules failing when one part matches too early.
I've also written a series of functions that use matching rules that behave
more like regular expressions:

>> r2: ["s" "om" skip " " "t" some letter "t" " " "to" " " "p" skip skip "se"]
== ["s" "om" skip " " "t" some letter "t" " " "to" " " "p" skip skip "se"]
>> result: search str r2
== [19 18]
>> copy/part at str first result second result
== "some text to parse"

The "regular-expression-like" matching here is that with a parse rule SOME
LETTER will normally match up through the end of the word "text", leaving
nothing for the following "t" to match. SEARCH will backtrack to see if
there's some other way to match. SEARCH also overcomes the problem of the
middle "t" in testament.

If anyone's interested, I could post a copy of the script. (Unfortunately
it's too slow for most practical uses, but someone might find it amusing.)

See you,
Eric

Reply via email to