According to Geoff Hutchison:
> On Mon, 14 May 2001, Gilles Detillieux wrote:
> > What I can't figure out is why there's so many "spontaneous" calls to
> > regcomp! That seems to be where it's spending almost all of its time
> > (i.e. in children of regcomp) but the profiling gives no clue as to what
>
> I expect a lot of calls to regcomp from these sorts of calls in
> Retriever.cc:
>
> tmpList.Create(config.Find(&aUrl,"exclude_urls")," \t");
> HtRegexList excludes;
> excludes.setEscaped(tmpList);
> if (excludes.match(url, 0, 0) != 0)
> {
That should only amount to 1 or 2 calls per URL. Joe was getting 10s of
millions of calls for a few hundred URLs. Doesn't make sense. Also, the
calls to regcomp from setEscaped should be tracked properly but these
millions of calls were labelled "spontaneous".
> Of course the question is how bad the performance hit is here. One
> possibility is to do excludes and company on a per-server basis and save
> the HtRegexList object. This would be a *huge* speedup.
>
> The other question is whether HtRegexList is yet fully optimized depending
> on how painful it is to call regcomp. Right now it tries to make the
> longest possible regex and will make a new list entry when regcomp
> fails. (Thus the huge number of calls...) This makes the shortest # of
> regex, but may not be the overall speed win.
The cummulative times in HtRegexList don't amount to a whole lot, according
to the profile output Joe posted. It seems regcomp() is being called from
elsewhere, but where the calls come from doesn't appear in the data.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev