from:"Klaus Johannes Rusch"

[Robots] Re: Anti-thesaurus proposal

2001-11-29 Thread Klaus Johannes Rusch

, but holds looping links or dynamically generated links which are best navigated via the statedataless sitemaps links. } /div The id attribute is defined as an ID, as the name implies, so it must be unique, so this cannot be used to mark areas of the page indexable. -- Klaus Johannes Rusch

[Robots] Re: Correct URL, shlash at the end ?

2001-11-21 Thread Klaus Johannes Rusch

not make any assumptions how a URL is interpreted by the server). -- Klaus Johannes Rusch [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send help in the body

Re: FW: Stemming and Wildcards in robots.txt files

2000-03-15 Thread Klaus Johannes Rusch

tag. robots.txt does not provide a mechanism for this. Klaus Johannes Rusch -- [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/

Re: Redirect commands

2000-04-19 Thread Klaus Johannes Rusch

. Apparently, I have heard there is a way to make a robots.txt file redirect from this sort of page. There is no redirect option in robots.txt. Many robots will honor HTTP redirects (that is, status codes 301 and 302) and ROBOTS meta tag (in your case probably NOINDEX,FOLLOW) Klaus Johannes Rusch

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Klaus Johannes Rusch

/phf?... http://localhost/default.ida?... http://proxy/ -- Klaus Johannes Rusch [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send help in the body of a message

[Robots] Re: leading whitespace in robots.txt files

2002-03-25 Thread Klaus Johannes Rusch

page works in Internet Explorer so I cannot be broken attitude). Rather than modifying the library I would suggest any application that wants to handle this content error gracefully should strip leading whitespace prior to calling parse(). -- Klaus Johannes Rusch [EMAIL PROTECTED] http

Re: [Robots] links with blanks ?

2003-03-22 Thread Klaus Johannes Rusch

as the spaces are correctly encoded either as plus signs, or as %20, the URLs are valid and should work with browsers and crawlers alike. URLs with spaces that are not encoded are not valid, and only work in some browsers. Crawlers most probably don't index those pages either. -- Klaus Johannes

Re: [Robots] Testing a Web Crawler

2004-05-26 Thread Klaus Johannes Rusch

robots for comparison (link checkers such as linklint or Watchfire's Linkbot can be very useful). How do I know if my random selection of sites algorithm is working correctly? How do you define correctness, that is along which axes should the selection algorithm randomize? -- Klaus Johannes

Re: [Robots] Googlebot complaint (anyone from Google reading?)

2005-08-27 Thread Klaus Johannes Rusch

be helpful if you included some examples. Just guessing, Google does include pages in search results that have not actually crawled but identified based on links from other sites. You can identify these by the fact that they do not show details, such as an extract from the page. -- Klaus Johannes

[Robots] Re: Anti-thesaurus proposal

[Robots] Re: Correct URL, shlash at the end ?

Re: FW: Stemming and Wildcards in robots.txt files

Re: Redirect commands

[Robots] Re: Perl and LWP robots

[Robots] Re: leading whitespace in robots.txt files

Re: [Robots] links with blanks ?

Re: [Robots] Testing a Web Crawler

Re: [Robots] Googlebot complaint (anyone from Google reading?)

9 matches

Site Navigation

Mail list logo

Footer information