[Robots] Re: Anti-thesaurus proposal

2001-11-29 Thread Klaus Johannes Rusch
, but holds looping links or dynamically generated links which are best navigated via the statedataless sitemaps links. } /div The id attribute is defined as an ID, as the name implies, so it must be unique, so this cannot be used to mark areas of the page indexable. -- Klaus Johannes Rusch

[Robots] Re: Correct URL, shlash at the end ?

2001-11-21 Thread Klaus Johannes Rusch
not make any assumptions how a URL is interpreted by the server). -- Klaus Johannes Rusch [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send help in the body

Re: FW: Stemming and Wildcards in robots.txt files

2000-03-15 Thread Klaus Johannes Rusch
tag. robots.txt does not provide a mechanism for this. Klaus Johannes Rusch -- [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/

Re: Redirect commands

2000-04-19 Thread Klaus Johannes Rusch
. Apparently, I have heard there is a way to make a robots.txt file redirect from this sort of page. There is no redirect option in robots.txt. Many robots will honor HTTP redirects (that is, status codes 301 and 302) and ROBOTS meta tag (in your case probably NOINDEX,FOLLOW) Klaus Johannes Rusch

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Klaus Johannes Rusch
/phf?... http://localhost/default.ida?... http://proxy/ -- Klaus Johannes Rusch [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send help in the body of a message

[Robots] Re: leading whitespace in robots.txt files

2002-03-25 Thread Klaus Johannes Rusch
page works in Internet Explorer so I cannot be broken attitude). Rather than modifying the library I would suggest any application that wants to handle this content error gracefully should strip leading whitespace prior to calling parse(). -- Klaus Johannes Rusch [EMAIL PROTECTED] http

Re: [Robots] links with blanks ?

2003-03-22 Thread Klaus Johannes Rusch
as the spaces are correctly encoded either as plus signs, or as %20, the URLs are valid and should work with browsers and crawlers alike. URLs with spaces that are not encoded are not valid, and only work in some browsers. Crawlers most probably don't index those pages either. -- Klaus Johannes

Re: [Robots] Testing a Web Crawler

2004-05-26 Thread Klaus Johannes Rusch
robots for comparison (link checkers such as linklint or Watchfire's Linkbot can be very useful). How do I know if my random selection of sites algorithm is working correctly? How do you define correctness, that is along which axes should the selection algorithm randomize? -- Klaus Johannes

Re: [Robots] Googlebot complaint (anyone from Google reading?)

2005-08-27 Thread Klaus Johannes Rusch
be helpful if you included some examples. Just guessing, Google does include pages in search results that have not actually crawled but identified based on links from other sites. You can identify these by the fact that they do not show details, such as an extract from the page. -- Klaus Johannes