Greetings Chris and team,

Profiling should help a lot here.

One possibility is that it is because ht://Dig now allows attributes 
to be specified on a per-host basis.  As a result, all of the 
exclude_urls patterns are re-compiled for each url checked (even if 
you don't specify host-dependent attributes).  If (for example) there 
is a memory leak in the regex compiler, this would slow the code down 
as you describe.

Even if it is not causing Chris's problem, I've been thinking for a 
long time that this re-parsing may partially account for the big 
slowdown in digging with recent 3.2.0 betas.  From speaking with 
Gabriele, I know that he finds the feature very useful.  However, I 
can't find the config file format documented, so I don't think that 
many people can benefit from it, and it is currently just 
(unquantified) bloat.

If Chris finds that that is in fact the problem, I suggest that:
1. We set a flag if per-host attributes are used at all
2. If no per-host attributes are used, all expensive-to-parse
   attributes (like regular expressions) should be cached in their
   parsed forms.

Thoughts?

Lachlan

On Thu, 8 Apr 2004 01:52, Christopher Murtagh wrote:

> I've noticed that adding URLs *seriously* degrades
> digging performance. To a point that with 30 or so patterns, I got
> 8k pages in over 8 hours, and without them, I could do 15k pages in
> an hour.
>
>  With such a drastic difference, I'm assuming that there's a bug
> somewhere. I'll try to go digging through the code to find it, but
> I imagine that someone on this list will have better luck than me.
> :-)

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to