Crazy thought...
This is where the robots.txt file could be used to hold that
information for the robot agents that need to know the operational order
of the /
default names used on that service.
User-agent: *
Slash: default.htm, default.html, index.htm, index.html, welcome.html,
I guess it depends on what you are asking to have returned. ( And this bring
up another robots.txt question.. below)
http://www.abc.de/xyz
Asking for the directory. (where the service is allowed redirection to a
temporary default file list or another default file as a reply if the service
This does lead to the question what importance level do those
ahemreference/ahem links actually have to offer a spider? Is the
content supplier making a default statement on the value of the references
therein to a spider? Such as: The wapper reference ads links that I will
not show
Handling multiple character sets within the same file is still a problem.
Sometimes the agent encounters a multiple language file. At times the file
appearly is using overlapping character sets. The character sets like CP1252
and ISO8859-1 are used ( and browsers tolerate it, so the source
Form: Reply
Text: (51 lines follow)
Human Resources Développement des ressources
Development Canada humaines Canada
__
Anyone working on a robot that marks up ( semantic web style ) crawled
content and makes it available to
Hello Robots list
Well maybe this list can finally put to rest a great deal of the 30 second wait
issue.
Can we all collectively research into an adaptive routine?
We all need a common code routine that all our spidering modules and connective
programs can use.
Especially when we wish to