On Fri, 2004-04-16 at 10:27, Lachlan Andrew wrote:
> As I recall, you listed the patterns in a file, and included that file 
> in the htdig.conf file using backquotes.  It just occurred to me that 
> the file listing the patterns is probably being read in each time the 
> attribute is read (each time a url is parsed).
> 
> What happens to the speed if you list the patterns explicitly in the 
> htdig.conf file?

 Yeah, I thought the same thing and had tried that. Same results. From
what I can see in the source of htdig/Retriever.cc (line 998-1000), the
URL list is re-parsed at *every* URL:

    //
    // If the URL contains any of the patterns in the exclude list,
    // mark it as invalid
    //
    tmpList.Create(config->Find(&aUrl, "exclude_urls"), " \t");
    HtRegexList excludes;
    excludes.setEscaped(tmpList, config->Boolean("case_sensitive"));

 This would explain some of the problems that it is having. The other
place I have been looking at is the HtRegexList::setEscaped method
(htlib/HtRegexList.cc) which seems to be really expensive, but I've
really lost touch with C++ and I'm definitely not a good judge of it
anymore.


Cheers,

Chris

-- 
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Communications Group 
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to