[Robots] Re: Anti-thesaurus proposal

2001-11-21 Thread Avi Rappoport
I like the proposal for !-- noindex --junk here!-- /noindex --, I think a lot of people have a hard time getting their heads around a stop after an implicit start Avi -- Complete Guide to Search Engines for Web Sites and Intranets http://www.searchtools.com -- This message was sent by

[Robots] Re: Anti-thesaurus proposal

2001-11-21 Thread Sean 'Captain Napalm' Conner
It was thus said that the Great Walter Underwood once stated: As for the anti-thesaurus proposal, many search engines already provide something that does a similar job. You can mark sections of a document to not be indexed. Usually, you want to do this for the topnav, sidebars, ads, and

[Robots] Re: Anti-thesaurus proposal

2001-11-21 Thread Alan Perkins
For example, Inktomi Enterprise Search uses !--stopindex-- and !--startindex-- to turn indexing off and on within a page. Other engines use different tags. That's only for the Enterprise Search, not main Inktomi indexes - is that correct? I don't know any global indexes that support such a

[Robots] Re: Correct URL, shlash at the end ?

2001-11-21 Thread Thomas Witt
You may have more than just two scans on the resource, as urls such as http://www.abc.de/xyz/index.html will also return the same document. Calculate a checksum for each url retrieved, and compare for identical checksums. If you find that one page is identical to another, the second can

[Robots] Re: Correct URL, shlash at the end ?

2001-11-21 Thread Klaus Johannes Rusch
In [EMAIL PROTECTED], Matthias Jaekle [EMAIL PROTECTED] writes: I read about adding a slash at the end of the URLs, if there is no absolut path present. But what about pathes ending in subdirectories (xyz). A link to http://www.abc.de/xyz/ might be more correct then the link to

[Robots] Re: Correct URL, shlash at the end ?

2001-11-21 Thread thomas.kay
I guess it depends on what you are asking to have returned. ( And this bring up another robots.txt question.. below) http://www.abc.de/xyz Asking for the directory. (where the service is allowed redirection to a temporary default file list or another default file as a reply if the service