Thanks for helping clarify that for me. I'll look into the user_agent/allow/disallow stuff.
Thanks for the pointers! Dan > On Wednesday, June 25, 2003, at 04:19 PM, Dan Muey wrote: > > > I'll try to explain better what I was asking: > > > > Say I have htdig_one.conf > > start_url: http://www.mydomain.com/ > > > > and http://www.mydomain.com/robots.txt has: > > > > Disallow: /members/ > > > > Then http://www.mydomain.com/members/ will not get spidered/indexed > > into the database for htdig_one.conf > > > > Ok pretty standard and simple. Now the question: > > > > I want to set up a separate database for > > http://www.mydomain.com/members/ so I do this: > > ( I realize the data is still accessable so the separate > > database doesn't secure the data, I simply need the > data seperated) > > > > htdig_two.conf > > start_url: http://www.mydomain.com/members/ > > that will creat the db ... > > > > but http://www.mydomain.com/robots.txt still has: > > > > Disallow: /members/ > > > > in it. > > > > So will htdig_two.conf still be able to spider/index > > http://www.mydomain.com/members/ > > Or will the http://www.mydomain.com/robots.txt file stop > htdig in it's > > tracks in this case? > > As described, htdig will not index /members/ It always checks for > /robots.txt and respects any stated exclusions. However the robots > exclusion protocol allows you to specify disallows on an > agent by agent > basis. So if you use the the user_agent attribute to specify a > different agent name in the second configuration file, you can then > define an robots.txt file that allows the agent you defined access to > the members directory. I don't recall the exact syntax for the > robots.txt file, but if you check a tutorial on the subject > you should > find that it is pretty straightforward. The user_agent attribute is > described at http://www.htdig.org/attrs.html#user_agent. > > If you are simply trying to exclude /members/ from one > database and are > not really concerned about what other crawlers are doing, then the > easiest thing would probably be to use the exclude_urls attribute to > drop URLs that contain /members/ in the first database. > > Jim > > ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

