On Wed, 21 Apr 2004, Lachlan Andrew wrote:

> Date: Wed, 21 Apr 2004 23:13:27 +1000
> From: Lachlan Andrew <[EMAIL PROTECTED]>
> To: Gilles Detillieux <[EMAIL PROTECTED]>,
     Christopher Murtagh <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: [htdig-dev] Re: Performance issue with exclude_urls
> 
> Greetings Gilles + all,
> 
> Yes, I agree that we need a more "polished" patch for the 
> distribution.  I still like my intermediate path:  If *any* server 
> blocks or URL blocks are used, then the user takes the performance 
> hit and re-parses each time.  If *no* server/URL blocks are used, we 
> use Chris's patch.  This should be just as fast as Chris's patch (in 
> the "3.1-compatibly mode" without server/URL blocks), and just as 
> flexible as the current status (if blocks are used).  If that can get 
> ht://Dig fast enough to get into sarge, then I suggest we implement 
> it first, and then work on Gilles's more complete solution at more 
> leisure.

I applied Chris' patch and ran htdig on the same site as before for
profile; htdig ran ~40% faster than last time;)  Here is the profile:

 ftp://ftp.ccsf.org/htdig-patches/3.2.0b5/htdig.gmon.exclude_perform.gz

> A first hack at this (not even compile-tested) is attached, patched 
> relative to Chris's patched version, so you can see what I mean.  If 
> people are in favour, I'll try to work on it over the weekend.

The "slightly-better.0" patch applies, but it does not compile:

Retriever.cc: In method `int Retriever::IsValidURL(const String &)':
Retriever.cc:998: `config_server_URL_blocks' undeclared (first use this function)
Retriever.cc:998: (Each undeclared identifier is reported only once
Retriever.cc:998: for each function it appears in.)
gmake[1]: *** [Retriever.o] Error 1

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]

> One issue with caching input strings is that we would have to have 
> some sort of cache-flushing, or just let the storage grow as HtRegEx 
> is called repeatedly.
> 
> Cheers,
> Lachlan
> 
> On Wed, 21 Apr 2004 07:45 am, Gilles Detillieux wrote:
> > Hi, Chris and other developers.  The problem with this fix is that
> > exclude_urls and bad_querystr can no longer be used in server
> > blocks or URL blocks, as they'll only be parsed once regardless of
> > how they're used.



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to