[htdig-dev] Re: Performance issue with exclude_urls

Christopher Murtagh Wed, 21 Apr 2004 08:09:12 -0700

On Wed, 2004-04-21 at 09:13, Lachlan Andrew wrote:
> Yes, I agree that we need a more "polished" patch for the 
> distribution.  I still like my intermediate path:  If *any* server 
> blocks or URL blocks are used, then the user takes the performance 
> hit and re-parses each time.


 That sounds like a decent plan to me. However, 'performance hit' is a
serious understatement with the current code. Without my patch, my Dual
3GHz Xeon had one CPU pegged at 100% for 8 hours and still only managed
to index a little under 1000 pages per hour. Once I stopped re-parsing
the URL list at for each URL, the CPU usage was down to about 6-10%,
essentially making it an I/O bound problem rather than CPU.

 I suspect, as Gilles suggests that there are probably other
optimizations in the Regex code that could help out a lot in this
matter.

> If *no* server/URL blocks are used, we  use Chris's patch.  This should 
> be just as fast as Chris's patch (in the "3.1-compatibly mode" without 
> server/URL blocks), and just as flexible as the current status 
> (if blocks are used). 

 I can't believe it took me until today to find the block configuration
in the documentation. Once I found it, it seems to be in an obvious
place, but perhaps it needs a mention as a feature in the front page of
the docs?

 
Cheers,

Chris

-- 
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Communications Group 
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

[htdig-dev] Re: Performance issue with exclude_urls

Reply via email to