On Wed, 16 Feb 2005 16:25:57 +0100, David Landgren <[EMAIL PROTECTED]> wrote:
> demerphq wrote:
> > On Tue, 15 Feb 2005 13:11:21 -0800, Justin Mason <[EMAIL PROTECTED]> wrote:
> >
> >>-----BEGIN PGP SIGNED MESSAGE-----
> >>Hash: SHA1
> >>
> >>Yitzchak Scott-Thoennes writes:
> >>
> >>>On Tue, Feb 15, 2005 at 12:40:50PM -0800, Justin Mason wrote:
> >>>
> >>>>FWIW, this looks like it'd be excellent for SpamAssassin ;)
> >>>
> >>>Maybe somebody with SpamAssassin could do some benchmarks?
> >>
> 
> [...]
> 
> > I did however do some benchmarks on the perfomance of one particular
> > regex in the SA suite. Its from SA::Bayes and looks like this:
> >
[...]
> 
> If you're benching, can you compare how your trie version (out)?performs
>  regexps produced by Regexp::Optimizer and Regexp::Assemble on the
> above list of words?

Yeah ive put together a benchmark program to test my current build
against blead. It adds whitespace and the /x modifier to the regexes
to make them easier to read. It tests the same list of words (sorted
however) as was in the SA module with and without \b wrappers as
constructed by a bunch of the Regexp modules. If anybody is interested
in seeing the code they should ask off list, but the underlying
benchmark is:
  
  my $found=0; $found++ while $text=~/$re{$name}/g;

Where $text is my local regcomp.c (just because its a large file)

Here is the output:
----------------------------------------------------
used the following Regex modules: Regexp::List, Regexp::Assemble,
        Regexp::Optimizer, Regex::PreSuf
Read 181168 chars from '\blead_trie\regcomp.c'
Got 74 words to match
Testing regexes:
Regexp 'orig' :
(?x-ism:(
        (?: a (?: nd | ny | ble | ll | re )| m (?: uch | ost | ade | ore |
        ail | ake | ailing | any | ailto )| t (?: his | he | ime | hrough |
        hat )| w (?: hy | here | ork | orld | ith | ithout | eb )| f (?:
        rom | or | ew )| e (?: ach | ven | mail )| o (?: ne | ff | nly | wn
        | ut )| n (?: ow | ot | eed )| s (?: uch | ame )| l (?: ook | ike |
        ong )| y (?: ou | our | ou're )| The | has | have | into | using |
        http | see | It's | it's | number | just | both | come | years |
        right | know | already | people | place | first | because | And |
        give | year | information | can )
))
Regexp 'RA' [Regexp::Assemble] :
(?x-ism:(
        (?: m (?: a (?: il (?: ing | to )?| [dk]e | ny )| o (?: re | st )|
        uch )| w (?: ith (?: out )?| h (?: ere | y )| or (?: ld | k )| eb
        )| a (?: l (?: ready | l )|(?: bl | r ) e | n[dy] )| t (?: h (?:
        rough | at | is | e )| ime )| i (?: n (?: formation | to )| t's
        )|(?: p (?: eopl | lac )| giv | Th ) e | o (?: n (?: ly | e )| ff |
        ut | wn )| y (?: ou (?: 're | r )?| ears ?)| n (?: o[tw] | umber |
        eed )| f (?: irst | rom | ew | or )| l (?: o (?: ng | ok )| ike )|
        h (?: a (?: ve | s )| ttp )| s (?:(?: am | e ) e | uch )| e (?:
        mail | ach | ven )| b (?: ecause | oth )|(?: righ | jus ) t | c (?:
        ome | an )| using | It's | know | And )
))
Regexp 'RL' [Regexp::List] :
(?x-ism:(
        (?= [AITabcefghijklmnoprstuwy] )(?: a (?: l (?: l | ready )| n[dy]
        |(?: bl | r ) e )| b (?: ecause | oth )| c (?: an | ome )| e (?:
        ach | mail | ven )| f (?: ew | irst | or | rom )| h (?: a (?: s |
        ve )| ttp )| i (?: n (?: formation | to )| t\'s )| l (?: o (?: ng |
        ok )| ike )| m (?: a (?: il (?: ing | to )?| [dk]e | ny )| o (?: re
        | st )| uch )| n (?: o[tw] | eed | umber )| o (?: n (?: e | ly )|
        ff | ut | wn )| p (?: eopl | lac ) e | s (?:(?: am | e ) e | uch )|
        t (?: h (?: at | e | is | rough )| ime )| w (?: h (?: ere | y )|
        ith (?: out )?| or (?: k | ld )| eb )| y (?: ears ?| ou (?: \'re |
        r )?)| And | It\'s |(?: Th | giv ) e |(?: jus | righ ) t | know |
        using )
))
Regexp 'RO' [Regexp::Optimizer] :
(?x-ism:(
        (?: a (?: l (?: l | ready )| n[dy] |(?: bl | r ) e )| b (?: ecause
        | oth )| c (?: an | ome )| e (?: ach | mail | ven )| f (?: ew |
        irst | or | rom )| h (?: a (?: s | ve )| ttp )| i (?: n (?:
        formation | to )| t's )| l (?: o (?: ng | ok )| ike )| m (?: a (?:
        il (?: ing | to )?| [dk]e | ny )| o (?: re | st )| uch )| n (?:
        o[tw] | eed | umber )| o (?: n (?: e | ly )| ff | ut | wn )| p (?:
        eopl | lac ) e | s (?:(?: am | e ) e | uch )| t (?: h (?: at | e |
        is | rough )| ime )| w (?: h (?: ere | y )| ith (?: out )?| or (?:
        k | ld )| eb )| y (?: ears ?| ou (?: 're | r )?)| And | It's |(?:
        Th | giv ) e |(?: jus | righ ) t | know | using )
))
Regexp 'RPS' [Regex::PreSuf] :
(?x-ism:(
        (?: And | It\'s | The | a (?: ble | l (?: ready | l )| n[dy] | re
        )| b (?: ecause | oth )| c (?: an | ome )| e (?: ach | mail | ven
        )| f (?: ew | irst | or | rom )| give | h (?: a (?: ve | s )| ttp
        )| i (?: n (?: formation | to )| t\'s )| just | know | l (?: ike |
        o (?: ng | ok ))| m (?: a (?: de | il (?: ing | to )?| ke | ny )| o
        (?: re | st )| uch )| n (?: eed | o[tw] | umber )| o (?: ff | n (?:
        ly | e )| ut | wn )| p (?: eopl | lac ) e | right | s (?: ame | ee
        | uch )| t (?: h (?: at | is | rough | e )| ime )| using | w (?: eb
        | h (?: ere | y )| ith (?: out )?| or (?: ld | k ))| y (?: ears ?|
        ou (?: \'re | r )?))
))
Regexp 'smpl' :
(?x-ism:(
        And | It's | The | able | all | already | and | any | are | because
        | both | can | come | each | email | even | few | first | for |
        from | give | has | have | http | information | into | it's | just
        | know | like | long | look | made | mail | mailing | mailto | make
        | many | more | most | much | need | not | now | number | off | one
        | only | out | own | people | place | right | same | see | such |
        that | the | this | through | time | using | web | where | why |
        with | without | work | world | year | years | you | you're | your
))
Running tests for 2 seconds on 2 perls:
Pfx |Perl
----+--------------------------
A   |g:\perl_trie\bin\perl.exe
B   |g:\perl_blead\bin\perl.exe
----+--------------------------
         Rate B_smpl B_orig B_RO B_RA B_RPS A_RA B_RL A_RO A_RPS
A_orig A_RL A_smpl
B_smpl 2.87/s     --   -38% -43% -44%  -58% -58% -61% -64%  -65%  
-75% -77%   -89%
B_orig 4.64/s    62%     --  -7% -10%  -31% -33% -37% -42%  -43%  
-59% -62%   -82%
B_RO   4.99/s    74%     7%   --  -3%  -26% -28% -32% -37%  -39%  
-56% -60%   -81%
B_RA   5.15/s    80%    11%   3%   --  -24% -25% -30% -35%  -37%  
-55% -58%   -80%
B_RPS  6.75/s   136%    45%  35%  31%    --  -2%  -8% -15%  -17%  
-41% -45%   -74%
A_RA   6.89/s   140%    48%  38%  34%    2%   --  -6% -13%  -15%  
-39% -44%   -73%
B_RL   7.34/s   156%    58%  47%  42%    9%   7%   --  -8%  -10%  
-35% -41%   -71%
A_RO   7.95/s   177%    71%  59%  54%   18%  15%   8%   --   -2%  
-30% -36%   -69%
A_RPS  8.13/s   184%    75%  63%  58%   20%  18%  11%   2%    --  
-28% -34%   -68%
A_orig 11.4/s   296%   145% 128% 120%   68%  65%  55%  43%   40%    
--  -8%   -56%
A_RL   12.4/s   331%   166% 148% 140%   83%  80%  68%  56%   52%    
9%   --   -52%
A_smpl 25.7/s   795%   453% 414% 398%  280% 273% 250% 223%  216%  
126% 108%     --


Testing with /\b(PAT)\b/
Pfx |Perl
----+--------------------------
A   |g:\perl_trie\bin\perl.exe
B   |g:\perl_blead\bin\perl.exe
----+--------------------------
         Rate B_smpl B_orig B_RO B_RA A_RPS B_RPS A_RA A_RO B_RL
A_orig A_RL A_smpl
B_smpl 11.9/s     --   -38% -43% -43%  -48%  -56% -58% -63% -69%  
-74% -76%   -88%
B_orig 19.3/s    62%     --  -7%  -7%  -15%  -28% -32% -40% -50%  
-57% -62%   -80%
B_RO   20.7/s    74%     7%   --  -0%   -9%  -23% -27% -36% -46%  
-54% -59%   -79%
B_RA   20.8/s    75%     8%   0%   --   -9%  -23% -27% -36% -46%  
-54% -59%   -79%
A_RPS  22.7/s    91%    18%  10%  10%    --  -15% -20% -30% -41%  
-50% -55%   -77%
B_RPS  26.8/s   126%    39%  29%  29%   18%    --  -6% -17% -30%  
-41% -47%   -73%
A_RA   28.5/s   139%    47%  37%  37%   25%    6%   -- -12% -26%  
-37% -43%   -71%
A_RO   32.4/s   172%    68%  56%  56%   42%   21%  14%   -- -15%  
-28% -36%   -67%
B_RL   38.2/s   222%    98%  84%  84%   68%   42%  34%  18%   --  
-16% -24%   -61%
A_orig 45.3/s   281%   135% 118% 118%   99%   69%  59%  40%  18%    
-- -10%   -54%
A_RL   50.4/s   324%   161% 143% 143%  122%   88%  77%  56%  32%   
11%   --   -49%
A_smpl 98.9/s   732%   413% 377% 376%  335%  269% 248% 206% 159%  
119%  96%     --

-- 
First they ignore you, then they laugh at you, then they fight you,
then you win.
  +Gandhi

Reply via email to