According to Joe R. Jah:
> On Fri, 30 Nov 2001, Gilles Detillieux wrote:
> > I don't think the difference between 99 and 104 seconds is significant.
> > This confirms my suspicion that the HAVE_BROKEN_REGEX doesn't do a
> > whole lot.  To be sure, though, I think we'd need timings for 112501 +
> > parsedate.0 + ssl.6, remove reference to regex.o in htlib/Makefile, #undef
> > AND #define HAVE_BROKEN_REGEX (i.e. two tests) in include/htconfig.h
> > (but don't remove htlib/regex.h).  I suspect the timings for both will
> > be like the 2nd test above, around 143 sec.
> 
> ___________________ 112501 + parsedate.0 + ssl.6 ___________________
> remove reference to regex.o in htlib/Makefile
> #define HAVE_BROKEN_REGEX in include/htconfig.h
> 
> htdig: Start digging:   Sat Dec 1 00:10:58 PST 2001
> htmerge: Start merging: Sat Dec 1 00:12:44 PST 2001   106
...
> ___________________ 112501 + parsedate.0 + ssl.6 ___________________
> remove reference to regex.o in htlib/Makefile
> #undef HAVE_BROKEN_REGEX in include/htconfig.h
> 
> htdig: Start digging:   Sat Dec 1 00:18:55 PST 2001
> htmerge: Start merging: Sat Dec 1 00:20:38 PST 2001   103
...

OK, these are all around 100 sec, so I guess the main thing is to make
sure the bundled htlib/regex.c isn't compiled and the resulting regex.o
put into htlib/htlib.a.  Removing the reference to regex.o in the
Makefile seems to be the key.

> > I suspect the difference between the 143 and the 99-104 sec is due
> > to the inclusion of the bundled regex.h even though you're using
> > the C library regex.o code.  It's a wonder this works at all, but
> > there does seem to be some impact on performance.
> 
> I am not sure how that 143 came about last time; I can't reproduce it any
> more;-/

Probably some other system activity, or less pages in the disk cache
when you ran that test.  Are you getting times closer to 100 sec now?
This would stand to reason.

However, to be on the safe side, I think the code should make sure it
doesn't use the bundled regex.h if it doesn't use the bundled regex.c.
If you mix and match them, there may be problems in some cases we haven't
discovered yet.  Geoff said he'd look into what other packages do for
regex support.

> > > ____________________ 092301 + Armstrong + ssl.4 ____________________
> > > htdig: Start digging:   Fri Nov 30 00:18:06 PST 2001
> > > htmerge: Start merging: Fri Nov 30 00:18:44 PST 2001     38 seconds
> > ...
> > 
> > This is the part I find a bit troubling, but I don't know what we
> > can do about it.  I don't know why Armstrong's patch, which uses rx
> > instead of regex, causes htdig to run 2-3 times faster, unless there
> > are other changes between 092301 and 112501 that account for much of
> > this, but it could well be just implementation efficiencies in one
> > library and not in the other.
> 
> I reported the difference in indexing time to the list the very first time
> url_rewrite_rules was integrated in the code.  I don't believe at that
> time anything else had changed in the code.

Right you are.  The Sep 23 snapshot was just before I committed Geoff's
changes for url_rewrite_rules using regex.  Since then, very little
has changed that should affect htdig performance.  I was thinking back
to when your Armstrong patch benchmarks were on a snapshot from early
or mid-August, and before I had committed a number of parser changes.

> > In your tests above, do you make use of url_rewrite_rules?  If so,
> > how do the timings change if you don't use it?
> 
> ___________________ 112501 + parsedate.0 + ssl.6 ___________________
> remove reference to regex.o in htlib/Makefile
> #define HAVE_BROKEN_REGEX in include/htconfig.h
> no url_rewrite_rules
> 
> htdig: Start digging:   Sat Dec 1 00:40:09 PST 2001
> htmerge: Start merging: Sat Dec 1 00:40:34 PST 2001    25 seconds
...
> ___________________ 112501 + parsedate.0 + ssl.6 ___________________
> remove reference to regex.o in htlib/Makefile
> #undef HAVE_BROKEN_REGEX in include/htconfig.h
> no url_rewrite_rules
> 
> htdig: Start digging:   Sat Dec 1 00:28:50 PST 2001
> htmerge: Start merging: Sat Dec 1 00:29:10 PST 2001    20 seconds
...

OK, I don't think that 5 second difference can be treated as significant
given the variations in timings we've seen for other tests.  The only
way to get more significant results would be to run each test several
times and take the mean run time.

It is good to know that the latest code doesn't bog down when you're not
using url_rewrite_rules.  That suggests we're not seeing the sort of
wierdness we were seeing in your profiling of 3.2 several months ago,
with the millions of unexplained calls to regcomp.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to