>>>>> "AY" == Alan Young <[EMAIL PROTECTED]> writes:
AY> Updated script at bottom. AY> On 2/23/06, Uri Guttman <[EMAIL PROTECTED]> wrote: AY> $text =~ s{( AY> (\b\w+(?:['-]+\w+)*\b) >> >> why the multiple ['-] inside the words? could those chars ever begin or >> end words? so just [\w'-]+ should be fine there. AY> It's possible to have multi-hyphenated words. I didn't think it was AY> worth the time to figure out how to handle that and single apostrophe AY> words at the same time. Besides, I'm not verifying the accuracy of AY> the text. AY> In the spirit of testing though, I changed it to (\b[\w'-]*\b) and it AY> took 40 seconds and found 's and ' as words where the original did AY> not. no wonder it took so long. you matched the null string between each pair of word boundaries. you need a +, not * there. AY> This is the way I understand it: AY> (??{<code>}) replaces the regex at the current pos() with the result AY> of the <code> block. AY> If the the match ($^N) was not in the hash, then it would auto-vivify AY> the key and increment it and return (?!) which is a negative lookahead AY> on nothing, which always fails so we force it to backtrack and try AY> again. AY> If the match ( $^N) is in the hash, then it increments the value and AY> returns (?=) which is a positive lookahead on nothing, which always AY> succeeds so we continue on. i understand the boolean thing as i said previously. i was asking why you used it there. i see no reason if all you are doing is word counting. AY> Changing the regex to AY> 1 while $text =~ m{( AY> (\b\w+(?:['-]+\w+)*\b) AY> (?{!$unique{$^N}++}) AY> ) AY> }xg; AY> dropped the time down to 3s. >> since you just replace the word by itself, why use s///? m// will get >> the same results and should be much faster. AY> There was no appreciable difference between the two types of regexes AY> (see my code below). try this: $unique{$1}++ while $text =~ m/([\w'-]+)/g ; use the benchmark module to compare the speeds. make sure you don't do destructive parsing which some of your examples seem to to. uri -- Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org