According to Geoff Hutchison:
> On Mon, 14 May 2001, Gilles Detillieux wrote:
> > What I can't figure out is why there's so many "spontaneous" calls to
> > regcomp!  That seems to be where it's spending almost all of its time
> > (i.e. in children of regcomp) but the profiling gives no clue as to what
> 
> I expect a lot of calls to regcomp from these sorts of calls in
> Retriever.cc:
> 
>     tmpList.Create(config.Find(&aUrl,"exclude_urls")," \t");
>     HtRegexList excludes;
>     excludes.setEscaped(tmpList);
>     if (excludes.match(url, 0, 0) != 0)
>       {

That should only amount to 1 or 2 calls per URL.  Joe was getting 10s of
millions of calls for a few hundred URLs.  Doesn't make sense.  Also, the
calls to regcomp from setEscaped should be tracked properly but these
millions of calls were labelled "spontaneous".

> Of course the question is how bad the performance hit is here. One
> possibility is to do excludes and company on a per-server basis and save
> the HtRegexList object. This would be a *huge* speedup.
> 
> The other question is whether HtRegexList is yet fully optimized depending
> on how painful it is to call regcomp. Right now it tries to make the
> longest possible regex and will make a new list entry when regcomp
> fails. (Thus the huge number of calls...) This makes the shortest # of
> regex, but may not be the overall speed win.

The cummulative times in HtRegexList don't amount to a whole lot, according
to the profile output Joe posted.  It seems regcomp() is being called from
elsewhere, but where the calls come from doesn't appear in the data.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to