I missed the beginning on this conversation (too much email while
I was away), but here are my relevant comments to the last message:
> > During my lunch break, I wrote a small program to
> > compare the execution of Tcl_StringMatch() (what
> > ns_register_filter uses to match URLs) and
> > Tcl_RegExpExec() (what I propose
> > ns_register_filter should have the option of using).
It all depends on what you are actually matching, and at what
level (C or Tcl). I will show some Tcl-based examples only,
to explain what opts occur for the Tcl level. Note that in C
you would best your Tcl_StringCaseMatch (which is all that
Tcl_StringMatch calls). Also in C, if possible, storing the
RE as an obj and using the obj-based C RE functions is best.
In Tcl, you will see that string match is better whether you
use no special chars or with:
1 % time {eval string match foo bar} 1000
39 microseconds per iteration
2 % time {eval regexp foo bar} 1000
50 microseconds per iteration
3 % time {eval string match b* bar} 1000
38 microseconds per iteration
4 % time {eval regexp b.* bar} 1000
53 microseconds per iteration
REs have the advantage that the pattern can get more complex,
but if that isn't necessary, they will always be slower. I
rewrote the string match algorithm in 8.4 for more speed. The
above tests are 8.4.2 BTW. Also, I have to use eval due to
magic that occurs with the bytecode compiler which truly makes
things faster:
5 % time {string match foo bar} 1000
2 microseconds per iteration
6 % time {regexp foo bar} 1000
2 microseconds per iteration
and it isn't just static string checks, which do occur in part,
but sometimes more can be done with one command versus another:
[a is foo, b is bar]
7 % time {string match $a $b} 1000
3 microseconds per iteration
8 % time {regexp $a $b} 1000
16 microseconds per iteration
9 % time {regexp foo $b} 1000
3 microseconds per iteration
In the above you see the difference when I can't see whether I
can turn the regexp into a string equal or string match at
compile time. BTW, the above gets even more blurry since some
of the cases become string equal under the covers.
> I believe in Tcl 8.4 that if regexp sees a "simple" pattern (i.e. one
> that string match could use), then it will use string match to process
> it.
Yes, it is not an exhaustive check, but the when compiling a
regexp I do checks for static strings, anchored strings and
some handling for ".*" -> "*". Here is an example of where
regexp is never regexp under the covers, it is either string
match or string equal:
67 % time {regexp {^foo$} $b} 1000
2 microseconds per iteration
68 % time {string match {*foo*} $b} 1000
3 microseconds per iteration
69 % time {regexp foo $b} 1000
3 microseconds per iteration
70 % time {string equal foo $b} 1000
2 microseconds per iteration
All that said, it depends on what the ns_register_proc is doing.
Is it all C code, or are we talking about using regexp vs.
string commands at the Tcl level?
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list: http://www.aolserver.com/listserv.html
List information and options: http://listserv.aol.com/