Greetings Chris and team, Profiling should help a lot here.
One possibility is that it is because ht://Dig now allows attributes to be specified on a per-host basis. As a result, all of the exclude_urls patterns are re-compiled for each url checked (even if you don't specify host-dependent attributes). If (for example) there is a memory leak in the regex compiler, this would slow the code down as you describe. Even if it is not causing Chris's problem, I've been thinking for a long time that this re-parsing may partially account for the big slowdown in digging with recent 3.2.0 betas. From speaking with Gabriele, I know that he finds the feature very useful. However, I can't find the config file format documented, so I don't think that many people can benefit from it, and it is currently just (unquantified) bloat. If Chris finds that that is in fact the problem, I suggest that: 1. We set a flag if per-host attributes are used at all 2. If no per-host attributes are used, all expensive-to-parse attributes (like regular expressions) should be cached in their parsed forms. Thoughts? Lachlan On Thu, 8 Apr 2004 01:52, Christopher Murtagh wrote: > I've noticed that adding URLs *seriously* degrades > digging performance. To a point that with 30 or so patterns, I got > 8k pages in over 8 hours, and without them, I could do 15k pages in > an hour. > > With such a drastic difference, I'm assuming that there's a bug > somewhere. I'll try to go digging through the code to find it, but > I imagine that someone on this list will have better luck than me. > :-) -- [EMAIL PROTECTED] ht://Dig developer DownUnder (http://www.htdig.org) ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev