On Fri, 2004-04-16 at 10:27, Lachlan Andrew wrote:
> As I recall, you listed the patterns in a file, and included that file
> in the htdig.conf file using backquotes. It just occurred to me that
> the file listing the patterns is probably being read in each time the
> attribute is read (each time a url is parsed).
>
> What happens to the speed if you list the patterns explicitly in the
> htdig.conf file?
Yeah, I thought the same thing and had tried that. Same results. From
what I can see in the source of htdig/Retriever.cc (line 998-1000), the
URL list is re-parsed at *every* URL:
//
// If the URL contains any of the patterns in the exclude list,
// mark it as invalid
//
tmpList.Create(config->Find(&aUrl, "exclude_urls"), " \t");
HtRegexList excludes;
excludes.setEscaped(tmpList, config->Boolean("case_sensitive"));
This would explain some of the problems that it is having. The other
place I have been looking at is the HtRegexList::setEscaped method
(htlib/HtRegexList.cc) which seems to be really expensive, but I've
really lost touch with C++ and I'm definitely not a good judge of it
anymore.
Cheers,
Chris
--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Communications Group
McGill University
Montreal, Quebec
Canada
Tel.: (514) 398-3122
Fax: (514) 398-2017
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev