Dear Sirs,
I hope you can help me understand the following
facts...
I'm a Computer-Science student and I'm reading a book
about Perl, Programming Perl III ed. published by O'Reilly.
I find regular expressions very interesting and I tried to
use a regex analyzed in the book, the one to check if a string is
()-balanced. I rewrote it to check balancing also for [] and {}.
The regex is the following (I'm not so good writing regex,
because it is the first time I study this argument, so I'm not so sure the
following is correct or optimized...)
qr{
#start regex
(?> #non
backtracking subpattern
( [\(\[\{]
)
#opening. Store in $1
(?:
#clustering
(?>
#non backtracking subpattern
[^\(\)\[\]\{\}]*)
#not a paren
|
#or
(??{$re}
#recursion
)
#close non backtracking subpattern
)*
#zero or more times
(? #BEGIN
let's match the corrisponding closing paren
(?{$1 eq
"("}) #if
\) #then
|
#else
(?
#BEGIN
(?{$1 eq
"["}) #if
\]
#then
|
#else
\}
)
#close BEGIN
) #close
BEGIN
)
#close non backtracking subpattern
}ox;
#close regex :-))
so the regex is
qr{(?>([\(\[\{])(?:(?>[^\(\)\[\]\{\}]*)|(??{$re}))*(?(?{$1 eq
"("})\)|(?(?{$1 eq "["})\]|\})))}ox;
I think this is quite a programming miracle.
Here is the C-corresponding code (!)
void parseParenthesis(const char* line)
{ char* temp=(char*) calloc(strlen(line),sizeof(char)); int pos=0; while((*line)&&(pos>=0))
{ char cur=*line++; if((cur=='(')||(cur=='[')||(cur=='{'))
temp[pos++]=cur;
else { int closed=0; char searchFor; switch(cur)
{ case ')': { closed=1; searchFor='('; break; } case ']': { closed=1; searchFor='['; break; } case '}': { closed=1; searchFor='{'; break; } } if(closed) { --pos; if((pos<0)||(temp[pos]!=searchFor)) { errorBalancing(pos); printf("[ABORTED]\n"); return; } else temp[pos]='\0'; } } } if(temp[0]!='\0')
{ errorBalancing(pos); printf("[ABORTED]\n"); return; } printf("[Ok]\n");
return; } Ok. So, where's the problem?
=>Under Windows...
I executed the C program to parse a 200 000 character string
(balanced and not) (!) to test the time required to get the job done. It's
quite istantaneous.
I executed the perl script on the same 200 000 character
string (balanced and not) (!) and it took less than 3 seconds on my
system.
=>Under Red Hat Linux...
I executed the C program to parse a 200 000 character string
(balanced and not) (!). It's quite istantaneous.
I executed the perl script on the same 200 000 character
string (balanced and not) (!) and... it crashed!
I tried to port the C code in perl. I translated it and executed it with
the same huge string. It took less than 3 seconds on my system.
But, why did the regex crashed?!
While I know that the cpu complexity of the C algorithm (and the perl
translated one) is linear in the dimension of the string, I know nothing about
the complexity of the regex execution. I think there are too many
hidden details that prevent me to deeply take under control my
codes. Thank you in advance for every suggestion you may want to give
me.
Yours sincerely,
Luca Veraldi
----------------------------------------
Visit me at http://www.supremo.3000.it/ and sign in my Guest Book or email me at [EMAIL PROTECTED] ICQ# 115368178 |
- Re: Handling Perl regexes Luca Veraldi
- Re: Handling Perl regexes Jeff 'japhy' Pinyan
- Re: Handling Perl regexes Jeff 'japhy' Pinyan