Let me add: Please, for the love of Pete, respect the directives in the robots.txt file.
> -----Original Message----- > From: Jim Davis [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 05, 2003 10:45 AM > To: CF-Talk > Subject: RE: Remotely spider websites > > > > -----Original Message----- > > From: Jillian Carroll [mailto:[EMAIL PROTECTED]] > > Sent: Wednesday, February 05, 2003 10:18 AM > > To: CF-Talk > > Subject: Remotely spider websites > > > > > > >From what I have been able to find, the verity spider will > > only spider > > sites on a single network domain... so a remote machine must > > reside on the same domain. What I'm wondering is if anybody > > Yup. Verity (and MS Index Server) are what's called "Worms" - they > "burrow" through the file system of the machine itself. "Spiders", on > the other hand, "walk the web" and see only what's public > through HTTP. > > There are pros and cons to both, of course, but that's the main > difference. > > > can recommend (or has > > written) a tool that will let me spider remote sites in a > > CF-friendly manner? > > There are a ton, but I don't know how well supported they are (or if > they even exist any longer). Most of the major search engines have a > "home game" version of their software (either as a software package or > as a service they offer): > > Alta Vista: http://solutions.altavista.com/ > > Google: http://www.google.com/services/ > > Lycos: http://enterprise.lycos.com/Search/SiteSearch.asp > > You can dig up a lot more of these types of services by visiting > http://www.searchengines.com/ Unfortunately there's no > specific content > directed towards DIY, but they do have the largest collection > of public > engines around. > > Now the site to go for more general information is > http://www.searchenginewatch.com/ > > This is probably the most relevant page for you: > > http://www.searchenginewatch.com/resources/software.html > > But there's a lot more material on the site aimed at webmasters. > > Hope this helps, > > Jim Davis > President, Depressed Press of Boston: http://www.DepressedPress.com/ > Webmaster, First Night Boston: http://www.firstnight.org/ > Senior Consultant, Metlife eCommerce IT: http://www.metlife.com/ > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribe&forumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq This list and all House of Fusion resources hosted by CFHosting.com. The place for dependable ColdFusion Hosting. Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

