On Thu, 22 Nov 2001, George Phillips wrote:
Yes, they are extremely useful. But they're just rules that take
the stuff you used to get the current page and some relative stuff to
construct new stuff -- all done by the browser. The web server only
understands pure, unadulterated,
The above is just for consideration if the robots.txt is ever
updated so the
robots could be informed of this little detail.
There was a push in '96 or '97 to update the robots.txt standard
and I
wrote a proposal back then
(http://www.conman.org/people/spc/robots2.html)
and
Crazy thought...
This is where the robots.txt file could be used to hold that
information for the robot agents that need to know the operational order
of the /
default names used on that service.
User-agent: *
Slash: default.htm, default.html, index.htm, index.html, welcome.html,
You may have more than just two scans on the resource, as urls such as
http://www.abc.de/xyz/index.html will also return the same document.
Calculate a checksum for each url retrieved, and compare for identical
checksums. If you find that one page is identical to another, the second
can
In [EMAIL PROTECTED], Matthias Jaekle [EMAIL PROTECTED] writes:
I read about adding a slash at the end of the URLs, if there is no
absolut path present.
But what about pathes ending in subdirectories (xyz).
A link to http://www.abc.de/xyz/ might be more correct then the link
to
I guess it depends on what you are asking to have returned. ( And this bring
up another robots.txt question.. below)
http://www.abc.de/xyz
Asking for the directory. (where the service is allowed redirection to a
temporary default file list or another default file as a reply if the service