Geocities has a robots.txt file running that prevents htdig from
crawling the site.   I have written to them, they claimed to have fixed
it, but they didn't succeed.

1. What's the exact user-agent or entry they should be putting into
their robots.txt file to let us in?

2. There is a way to make the user-agent htdig uses to be something
else?  I could make it look like a netscape browser.  I hate to do this,
however, it's just not 'right', although it seems it would work.

Geocities is using this robots.txt to prevent spam crawlers from
trawling their site for email addresses, but I didn't know that spammers
were known for being voluntarily ethical.





----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to