According to Christopher Murtagh:
> On Wed, 2004-04-21 at 09:13, Lachlan Andrew wrote:
> > Yes, I agree that we need a more "polished" patch for the 
> > distribution.  I still like my intermediate path:  If *any* server 
> > blocks or URL blocks are used, then the user takes the performance 
> > hit and re-parses each time.
> 
>  That sounds like a decent plan to me. However, 'performance hit' is a
> serious understatement with the current code. Without my patch, my Dual
> 3GHz Xeon had one CPU pegged at 100% for 8 hours and still only managed
> to index a little under 1000 pages per hour. Once I stopped re-parsing
> the URL list at for each URL, the CPU usage was down to about 6-10%,
> essentially making it an I/O bound problem rather than CPU.
> 
>  I suspect, as Gilles suggests that there are probably other
> optimizations in the Regex code that could help out a lot in this
> matter.

Yes, what I initially had in mind was pretty simple.  The Regex object
already stores the compiled pattern for the last value used.  I'd just add
to that object the string that was compiled to get that binary pattern,
so that we could avoid repeatedly compiling the same pattern.  That,
in itself, would be a trivial addition.  It wouldn't be a full caching
scheme with multiple patterns stored, so we don't need to worry about
cache flushing.

However, it occurs to me that this fix wouldn't be a huge help to users
who do use exclude_urls or bad_querystr within URL or server blocks.
The reason for this is that htdig will alternate between servers,
to distribute the load, so when it's used a pattern within an URL or
server block it will move quickly on to another server, causing htdig
to have to go to another pattern -- forcing a recompile of the pattern.
Still, it would likely be an improvement over what it does now, for the
vast majority of users.

> > If *no* server/URL blocks are used, we  use Chris's patch.  This should 
> > be just as fast as Chris's patch (in the "3.1-compatibly mode" without 
> > server/URL blocks), and just as flexible as the current status 
> > (if blocks are used). 
> 
>  I can't believe it took me until today to find the block configuration
> in the documentation. Once I found it, it seems to be in an obvious
> place, but perhaps it needs a mention as a feature in the front page of
> the docs?

Yeah, it's not very prominent in the docs right now, and it ought to be
made more obvious.  Most users of 3.2 don't even know about this feature.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to