I'm inclined to agree that a second file would probably get overlooked
by bots. I would imagine it was difficult trying to get those who run them
to respect the first one.
I was unaware of the 'Allow' command. Is there a URL that documents it?
Also, the use of wildcards when giving
This email address is no longer in use.
If you need to contact me, please call (07973) 172650
___
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots
--On Sunday, January 11, 2004 11:44 AM -0500 Fred Atkinson [EMAIL PROTECTED] wrote:
I was unaware of the 'Allow' command. Is there a URL that documents it?
The Allow directive is non-standard. Don't use it.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek
Another idea that has occured to me is to simply code the information to
be indexed in the robots.txt file. Then, the robot could simply suck the
information out of the file and be done.
Example:
User-agent: Scooter
Interval: 30d
Disallow: /
Name: Fred's Site
Index: /index.html
Name: My
I don't think the explicit names would be required, most robots simply
read the title tag, or infer it from the first portion of clear text,
the content meta tag, or other document attributes. Anyway, this method
would become quite burdensome for very complicated sites. I also suspect
the file
It was thus said that the Great Walter Underwood once stated:
--On Sunday, January 11, 2004 8:13 PM -0500 Sean 'Captain Napalm' Conner [EMAIL
PROTECTED] wrote:
And there you go. Using the different directives makes it backwards
compatible with the original robots.txt (where an older