I missed the beginning on this conversation (too much email while
I was away), but here are my relevant comments to the last message:

> > During my lunch break, I wrote a small program to
> > compare the execution of Tcl_StringMatch() (what
> > ns_register_filter uses to match URLs) and
> > Tcl_RegExpExec() (what I propose
> > ns_register_filter should have the option of using).

It all depends on what you are actually matching, and at what
level (C or Tcl).  I will show some Tcl-based examples only,
to explain what opts occur for the Tcl level.  Note that in C
you would best your Tcl_StringCaseMatch (which is all that
Tcl_StringMatch calls).  Also in C, if possible, storing the
RE as an obj and using the obj-based C RE functions is best.

In Tcl, you will see that string match is better whether you
use no special chars or with:

1 % time {eval string match foo bar} 1000
39 microseconds per iteration
2 % time {eval regexp foo bar} 1000
50 microseconds per iteration
3 % time {eval string match b* bar} 1000
38 microseconds per iteration
4 % time {eval regexp b.* bar} 1000
53 microseconds per iteration

REs have the advantage that the pattern can get more complex,
but if that isn't necessary, they will always be slower.  I
rewrote the string match algorithm in 8.4 for more speed.  The
above tests are 8.4.2 BTW.  Also, I have to use eval due to
magic that occurs with the bytecode compiler which truly makes
things faster:

5 % time {string match foo bar} 1000
2 microseconds per iteration
6 % time {regexp foo bar} 1000
2 microseconds per iteration

and it isn't just static string checks, which do occur in part,
but sometimes more can be done with one command versus another:

[a is foo, b is bar]
7 % time {string match $a $b} 1000
3 microseconds per iteration
8 % time {regexp $a $b} 1000
16 microseconds per iteration
9 % time {regexp foo $b} 1000
3 microseconds per iteration

In the above you see the difference when I can't see whether I
can turn the regexp into a string equal or string match at
compile time.  BTW, the above gets even more blurry since some
of the cases become string equal under the covers.

> I believe in Tcl 8.4 that if regexp sees a "simple" pattern (i.e. one
> that string match could use), then it will use string match to process
> it.

Yes, it is not an exhaustive check, but the when compiling a
regexp I do checks for static strings, anchored strings and
some handling for ".*" -> "*".  Here is an example of where
regexp is never regexp under the covers, it is either string
match or string equal:

67 % time {regexp {^foo$} $b} 1000
2 microseconds per iteration
68 % time {string match {*foo*} $b} 1000
3 microseconds per iteration
69 % time {regexp foo $b} 1000
3 microseconds per iteration
70 % time {string equal foo $b} 1000
2 microseconds per iteration

All that said, it depends on what the ns_register_proc is doing.
Is it all C code, or are we talking about using regexp vs.
string commands at the Tcl level?

  Jeff Hobbs                     The Tcl Guy
  Senior Developer               http://www.ActiveState.com/


--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list: http://www.aolserver.com/listserv.html
List information and options: http://listserv.aol.com/

Reply via email to