According to Matthew Nuzum: > Ht://Dig uses up nearly a gig of my bandwidth every month. It tends to > lap itself if I run it daily, so I've started running it every other > day. Otherwise it runs great. > > I am now presented with the task of needing to mirror sites, in addition > to index them with my search engine. I shudder to think of the > resources this will use; both on my web servers to be mirrored/indexed > and my bandwidth. Disk space, is not a big concern to me though. > > Is it possible to create a mirror of a site using the information in > ht://dig's databases so that I can save the extra effort of mirroring? > > I was using rsync to keep my bandwidth low, but now I need to switch to > something that works like wget so that I can get static html snapshots > instead of the actual cgi/php/asp source pages.
Just to add to what Geoff and Torsten have already written, you should be aware that indexing, mirroring or caching dynamic content (cgi/php/asp) will tend to be a high-bandwidth proposition because every time a page is loaded, it's regarded as "new" - there is no Last-Modified header to tell the client that the page hasn't been changed since the last time. Static HTML pages don't have that problem. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) _______________________________________________________________ Hundreds of nodes, one monster rendering program. Now that�s a super model! Visit http://clustering.foundries.sf.net/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

