Re: Interesting little regex

Uri Guttman Thu, 23 Feb 2006 10:02:46 -0800

>>>>> "AY" == Alan Young <[EMAIL PROTECTED]> writes:


  AY> I know, replying to myself.
  AY> Parsing the KJV Bible took about 7 seconds with this:

  AY> #!/usr/bin/perl -w

  AY> use strict;

  AY> my $text = do {
  AY>   open my $T, '<./kjv10.txt' or die "Couldn't open kjv10.txt: $!\n";
  AY>   local $/;
  AY>   <$T>;
  AY> };

use File::Slurp ;

my $text = readfile( 'bibble' ) ;

much faster that way.

  AY> my %unique;

  AY> $text =~ s{(
  AY>              (\b\w+(?:['-]+\w+)*\b)

why the multiple ['-] inside the words? could those chars ever begin or
end words? so just [\w'-]+ should be fine there.

  AY>              (??{!$unique{$^N}++?"(?=)":"(?!)"})

i am not sure why you do that boolean trick there. i have seen it before
(and actually use it somewhere but what is its purpose here?

  AY>            )
  AY>           }{
  AY>            $1

since you just replace the word by itself, why use s///? m// will get
the same results and should be much faster.

  AY>           }xg;

  AY> print "$_ => $unique{$_}\n" for sort keys %unique;

if you want raw speed, that makes lots of calls to print which is very
slow as it needs to invoke stdio code for each call. this should be
faster (even with the ram usage):

        print map "$_ => $unique{$_}\n", sort keys %unique;

i am curious how much faster it will run with all those changes. :)

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

Re: Interesting little regex

Reply via email to