On Wed, 2004-04-21 at 09:13, Lachlan Andrew wrote: > Yes, I agree that we need a more "polished" patch for the > distribution. I still like my intermediate path: If *any* server > blocks or URL blocks are used, then the user takes the performance > hit and re-parses each time.
That sounds like a decent plan to me. However, 'performance hit' is a serious understatement with the current code. Without my patch, my Dual 3GHz Xeon had one CPU pegged at 100% for 8 hours and still only managed to index a little under 1000 pages per hour. Once I stopped re-parsing the URL list at for each URL, the CPU usage was down to about 6-10%, essentially making it an I/O bound problem rather than CPU. I suspect, as Gilles suggests that there are probably other optimizations in the Regex code that could help out a lot in this matter. > If *no* server/URL blocks are used, we use Chris's patch. This should > be just as fast as Chris's patch (in the "3.1-compatibly mode" without > server/URL blocks), and just as flexible as the current status > (if blocks are used). I can't believe it took me until today to find the block configuration in the documentation. Once I found it, it seems to be in an obvious place, but perhaps it needs a mention as a feature in the front page of the docs? Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev