Handling Perl regexes

Luca Veraldi Sun, 15 Jul 2001 07:57:06 -0700

Dear Sirs,

I hope you can help me understand the following facts...

I'm a Computer-Science student and I'm reading a book about Perl, Programming Perl III ed. published by O'Reilly.

I find regular expressions very interesting and I tried to use a regex analyzed in the book, the one to check if a string is ()-balanced. I rewrote it to check balancing also for [] and {}.

The regex is the following (I'm not so good writing regex, because it is the first time I study this argument, so I'm not so sure the following is correct or optimized...)

qr{ #start regex

(?> #non backtracking subpattern

( [\(\[\{] ) #opening. Store in $1

(?: #clustering

(?> #non backtracking subpattern

[^\[\]\{\}]*) #not a paren

| #or

(??{$re} #recursion

) #close non backtracking subpattern

)* #zero or more times

(? #BEGIN let's match the corrisponding closing paren

(?{$1 eq "("}) #if

\) #then

| #else

(? #BEGIN

(?{$1 eq "["}) #if

\] #then

| #else

) #close BEGIN

) #close non backtracking subpattern

}ox; #close regex :-))

so the regex is

qr{(?>([$\[\{])(?:(?>[^\($\[\]\{\}]*)|(??{$re}))*(?(?{$1 eq "("})\)|(?(?{$1 eq "["})\]|\})))}ox;

I think this is quite a programming miracle.

Here is the C-corresponding code (!)

void parseParenthesis(const char* line)
{
char* temp=(char*) calloc(strlen(line),sizeof(char));
int pos=0;

while((*line)&&(pos>=0))
{
char cur=*line++;

if((cur=='(')||(cur=='[')||(cur=='{')) temp[pos++]=cur;
else
{
int closed=0;
char searchFor;

   switch(cur)
   {
    case ')':
    {
     closed=1;
     searchFor='(';
     break;
    }
    case ']':
    {
     closed=1;
     searchFor='[';
     break;
    }
    case '}':
    {
     closed=1;
     searchFor='{';
     break;
    }
   }
   if(closed)
   {
    --pos;
    if((pos<0)||(temp[pos]!=searchFor))
    {
     errorBalancing(pos);
     printf("[ABORTED]\n");
     return;
    }
    else temp[pos]='\0';
   }
}
}

if(temp[0]!='\0')
{
errorBalancing(pos);
printf("[ABORTED]\n");
return;
}

printf("[Ok]\n");
return;
}

Ok. So, where's the problem?

=>Under Windows...

I executed the C program to parse a 200 000 character string (balanced and not) (!) to test the time required to get the job done. It's quite istantaneous.

I executed the perl script on the same 200 000 character string (balanced and not) (!) and it took less than 3 seconds on my system.

=>Under Red Hat Linux...

I executed the C program to parse a 200 000 character string (balanced and not) (!). It's quite istantaneous.

I executed the perl script on the same 200 000 character string (balanced and not) (!) and... it crashed!

I tried to port the C code in perl. I translated it and executed it with the same huge string. It took less than 3 seconds on my system.

But, why did the regex crashed?!

While I know that the cpu complexity of the C algorithm (and the perl translated one) is linear in the dimension of the string, I know nothing about the complexity of the regex execution. I think there are too many hidden details that prevent me to deeply take under control my codes.

Thank you in advance for every suggestion you may want to give me.

Yours sincerely,

Luca Veraldi

----------------------------------------
Visit me at http://www.supremo.3000.it/
and sign in my Guest Book or email me at
[EMAIL PROTECTED]
ICQ# 115368178

Handling Perl regexes

Reply via email to