On Wed, Jun 02, 2004 at 11:29:47AM -0500, Timm Murray wrote: > At 09:22 AM 6/2/04 -0700, Bill Moseley wrote: > >On Wed, Jun 02, 2004 at 11:08:25AM -0500, Timm Murray wrote: > >> At 09:01 AM 6/2/04 -0700, Bill Moseley wrote: > >> <> > >> >Spider your content instead. Does htdig have anything like > >> ><!-- noindex --> ? > >> > >> We discussed spidering and my boss is against the idea. I personally > >would > >> prefer not to do it if we can get away with it. > > > >Can you explain why? I'm just curious why spidering would not be ok, > >but scanning the file system would be. > > Extra work on the server itself. Scanning the file system is easier on the > system than making a request to Apache, even on localhost.
Set a delay on the spider or what I've done is run a separate httpd with keep-alives way high and a low number of max clients. Use a spider that supports keep-alives. If you spider then you are indexing content that people can find. And you don't have to worry about rewrites or aliases. You also don't need a separate system to generate the content just for spidering. > I don't think either solution is particularly difficult to implement, > but scanning the content files directly also lets us have an easier > time analyzing the structure of the document. All the server does is supply the content. Analyzing the content happens after that, regardless of using the server or the file system. Spidering lets you index the content as people see it on their browser. Spidering isn't as expensive as people tend to think. If you are running dynamic content as plain CGI then there's your costs. Sounds like you have mostly static content, so that shouldn't be an issue. If you can avoid spidering (say your content is in a database) then, yes, I'd always just index the content. > In the case I'm suggesting using, the content files would be > re-processed by HTML::Template for any TMPL_INCLUDE tags (however, not > much has been created under this system besides some examples, so now > is a good time to make incompatible changes if need be). <TMPL_INCLUDE NAME="file.txt" SEARCHABLE="no"> So the point is to have some other program not based on HTML::Template to parse that? In that case why not just look for: <!-- don't index this section --> And then use HTML::Template to generate the output for indexing. -- Bill Moseley [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by the new InstallShield X. >From Windows to Linux, servers to mobile, InstallShield X is the one installation-authoring solution that does it all. Learn more and evaluate today! http://www.installshield.com/Dev2Dev/0504 _______________________________________________ Html-template-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/html-template-users