I'll try to explain better what I was asking:
Say I have htdig_one.conf
start_url: http://www.mydomain.com/
and http://www.mydomain.com/robots.txt has:
Disallow: /members/
Then http://www.mydomain.com/members/ will not get spidered/indexed into the database for htdig_one.conf
Ok pretty standard and simple. Now the question:
I want to set up a separate database for http://www.mydomain.com/members/ so I do this:
( I realize the data is still accessable so the separate
database doesn't secure the data, I simply need the data seperated)
htdig_two.conf start_url: http://www.mydomain.com/members/ that will creat the db ... but http://www.mydomain.com/robots.txt still has: Disallow: /members/
in it.
So will htdig_two.conf still be able to spider/index http://www.mydomain.com/members/
Or will the http://www.mydomain.com/robots.txt file stop htdig in it's tracks in this case?
As described, htdig will not index /members/ It always checks for /robots.txt and respects any stated exclusions. However the robots exclusion protocol allows you to specify disallows on an agent by agent basis. So if you use the the user_agent attribute to specify a different agent name in the second configuration file, you can then define an robots.txt file that allows the agent you defined access to the members directory. I don't recall the exact syntax for the robots.txt file, but if you check a tutorial on the subject you should find that it is pretty straightforward. The user_agent attribute is described at http://www.htdig.org/attrs.html#user_agent.
If you are simply trying to exclude /members/ from one database and are not really concerned about what other crawlers are doing, then the easiest thing would probably be to use the exclude_urls attribute to drop URLs that contain /members/ in the first database.
Jim
------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

