On 21.04.2012 22:46, H. S. Teoh wrote:
On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
On 21.04.2012 21:24, nrgyzer wrote:
Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";


Simply put pure regex is incapable of arbitrary-nested if-else
statements. In a sense if... /if is a case of balanced parens
problem and it's widely know to form non-regular language.

This is of theoretical interest to me. Very often I find myself wanting
a concise way to express patterns with nested matching delimiters, but
regexes can't do it. But to jump to a full-blown stack-based language
seems like an overkill to me: stack languages can express *much* more
than just nested delimiters, most of which is difficult to encapsulate
in a nice, concise syntax like regexes. All I want is a simple way to
express the kind of simple nesting that matching delimiters give.


Recursive descent is not particularly bad unless minimal "grammar descent depth" is high. Example:
a+b-c
uses a quite a lot of recursive calls for grammar with e.g. 10+ operator precedence levels.


I'm thinking of merging operator precedence parser with regex might be a happy marriage you know.

Back to OP topic something along the lines of this will do it (beware of stack overflow):

void parseIf(){
        static int ifNest;
        if(input.startWith("{if")){
                ifNest++;
                scope(exit) ifNest--;
                enforce(ifNest < 10000, "conservative stack overflow");
                parseCond(input[2..$-1]);//regex that does condition
                enforce(input[$-1] == '}', "close that if");
                parseIf();//parse body or another nested if
                //parseElse();//goes here, as does elif
        }
        else
                parseBody();// the regex you used before
}



[...]
One day I may add push-pop stack extensions (that allow this kind of
thing) into regex but I'd have to think hard to make it efficient.
[...]

Plus, have a nice concise syntax for it (otherwise I might as well just
write a recursive descent parser for it in the first place).


Concise syntax & lots of power is a priority actually, because I can detect if push-pop stuff is used or not and pick the right engine for the job. So different big-oh complexity is not important.

--
Dmitry Olshansky

Reply via email to