Hello Robots list
Well maybe this list can finally put to rest a great deal of the 30 second wait
issue.
Can we all collectively research into an adaptive routine?
We all need a common code routine that all our spidering modules and connective
programs can use.
Especially when we wish to
Form: Reply
Text: (51 lines follow)
Human Resources Développement des ressources
Development Canada humaines Canada
__
Anyone working on a robot that marks up ( semantic web style ) crawled
content and makes it available to
Handling multiple character sets within the same file is still a problem.
Sometimes the agent encounters a multiple language file. At times the file
appearly is using overlapping character sets. The character sets like CP1252
and ISO8859-1 are used ( and browsers tolerate it, so the source
This does lead to the question what importance level do those
ahemreference/ahem links actually have to offer a spider? Is the
content supplier making a default statement on the value of the references
therein to a spider? Such as: The wapper reference ads links that I will
not show
Crazy thought...
This is where the robots.txt file could be used to hold that
information for the robot agents that need to know the operational order
of the /
default names used on that service.
User-agent: *
Slash: default.htm, default.html, index.htm, index.html, welcome.html,
I guess it depends on what you are asking to have returned. ( And this bring
up another robots.txt question.. below)
http://www.abc.de/xyz
Asking for the directory. (where the service is allowed redirection to a
temporary default file list or another default file as a reply if the service