Hello Arthur, Arthur de Jong schrieb am Tue 22. May, 21:14 (+0200): > On Fri, 2007-05-18 at 14:29 +0200, Jörg Sommer wrote: > > % cat test.html > > <html> > > <body> > > <a href=".//test.html">a link</a> > > </body> > > </html> > > % webcheck test.html > > webcheck: checking site.... > > webcheck: file:///tmp/test.html > > webcheck: Warning: falling back to the legacy HTML parser, consider > > installing BeautifulSoup > > webcheck: file:///tmp//test.html > > webcheck: file:///tmp///test.html > [...] > > If this is not a valid URL webcheck should warn about it. But it > > should at least assume that multiple slashs in an URL are the same as > > one slash. > > The way I read RFC3986 (especially sections 3.3 and 6.2) is that these > are all separate and valid URLs that point to the same resource. > > In section 6.2.2.3 only the removal of "." and ".." in paths is > mentioned although 6.2.3 does leave some room for other normalisation.
Okay. I didn't know this. Then the bug report should be retitled to “webcheck should check every resource only once”. The problem with the currenct behaviour of webcheck is that it never reaches the end. It finds always a new recource it must check. Bye, Jörg. -- Die zehn Gebote Gottes enthalten 172 Wörter, die amerikanische Unabhängigkeitserklärung 300 Wörter, die Verordnung der europäischen Gemeinschaft über den Import von Karamelbonbons exakt 25911 Wörter.
pgpcE0xF5dZ2U.pgp
Description: PGP signature

