On Mon, 14 May 2001, Gilles Detillieux wrote:

> What I can't figure out is why there's so many "spontaneous" calls to
> regcomp!  That seems to be where it's spending almost all of its time
> (i.e. in children of regcomp) but the profiling gives no clue as to what

I expect a lot of calls to regcomp from these sorts of calls in
Retriever.cc:

    tmpList.Create(config.Find(&aUrl,"exclude_urls")," \t");
    HtRegexList excludes;
    excludes.setEscaped(tmpList);
    if (excludes.match(url, 0, 0) != 0)
      {

Of course the question is how bad the performance hit is here. One
possibility is to do excludes and company on a per-server basis and save
the HtRegexList object. This would be a *huge* speedup.

The other question is whether HtRegexList is yet fully optimized depending
on how painful it is to call regcomp. Right now it tries to make the
longest possible regex and will make a new list entry when regcomp
fails. (Thus the huge number of calls...) This makes the shortest # of
regex, but may not be the overall speed win.

-Geoff


_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to