According to Christopher Murtagh:
> On Tue, 2004-04-20 at 17:45, Gilles Detillieux wrote:
> > Hi, Chris and other developers.  The problem with this fix is that
> > exclude_urls and bad_querystr can no longer be used in server blocks or
> > URL blocks, as they'll only be parsed once regardless of how they're used.
> > That's OK if you don't use them in blocks, but for the distributed code,
> > we need to find a more generalized solution.  
> 
>  Right. Having just found the block documentation, I can indeed see this
> as a useful feature, and probably something that I would use if the
> performance hit wasn't so bad.
> 
>  One thing I could think of that could help performance quite
> considerably is to have an array of type *HtRegexList that could contain
> the parsed excludes list/badquery lists, etc. per block. Or perhaps a
> struct that contains all parsed config attributes per block and have an
> array of pointers to it. This way the config could be loaded and still
> only need to be parsed once. The only downside I could see is that this
> would mean htdig would have a slightly larger memory footprint, but I
> don't really see that as a big problem. We're probably talking about a
> couple k more, by today's standards, even a couple meg more wouldn't be
> a big deal.

There's an idea worth considering.  It's quite a bit more complicated than
the quick fix I had in mind, but probably much simpler than a full-blown
caching scheme.  It would also help out the case where regex-based
attributes are used in URL or server blocks, which my proposed fix would
only marginally help.

> > 3) We may also need to determine if the repeated calls to config->Find()
> > at each URL are having an impact on performance as well.  E.g. what is
> > the performance cost of doing thousands of calls like this one?
> > 
> >      tmpList.Create(config->Find(&aUrl, "exclude_urls"), " \t");
> 
>  Easy thing to test. I'll give it a try later this week if I can,
> perhaps tomorrow, and report back.

Great.  I'll try to get my fix to Regex.cc in by the end of the week too,
so it would be great if you could give it a whirl.  It would probably
mean having to back out your own patch, though, or it wouldn't really
get tested.

Neal, I'd still like your opinion on the matter of making these
HtRegexList variables global, and whether that will be a problem for
libhtdig.  Looking at the code, I see that "limits" and "limitsn", set
by limit_urls_to and limit_normalized, are already global.  But these
are defined in htdig.cc, rather than Retriever.cc.  Does this matter?

I imagine it just means making parallel changes to libhtdig_htdig.cc,
but right now it doesn't even seem to be making use of URL blocks, as
it doesn't pass aUrl to HtConfiguration::Find().  Is this an oversight,
or am I missing something?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to