Hello Arthur,

Arthur de Jong schrieb am Tue 22. May, 21:14 (+0200):
> On Fri, 2007-05-18 at 14:29 +0200, Jörg Sommer wrote:
> > % cat test.html
> > <html>
> > <body>
> > <a href=".//test.html">a link</a>
> > </body>
> > </html>
> > % webcheck test.html
> > webcheck: checking site....
> > webcheck:   file:///tmp/test.html
> > webcheck: Warning: falling back to the legacy HTML parser, consider 
> > installing BeautifulSoup
> > webcheck:   file:///tmp//test.html
> > webcheck:   file:///tmp///test.html
> [...]
> > If this is not a valid URL webcheck should warn about it. But it
> > should at least assume that multiple slashs in an URL are the same as
> > one slash.
> 
> The way I read RFC3986 (especially sections 3.3 and 6.2) is that these
> are all separate and valid URLs that point to the same resource.
> 
> In section 6.2.2.3 only the removal of "." and ".." in paths is
> mentioned although 6.2.3 does leave some room for other normalisation.

Okay. I didn't know this. Then the bug report should be retitled to
“webcheck should check every resource only once”. The problem with the
currenct behaviour of webcheck is that it never reaches the end. It finds
always a new recource it must check.

Bye, Jörg.
-- 
Die zehn Gebote Gottes enthalten 172 Wörter, die amerikanische
Unabhängigkeitserklärung 300 Wörter, die Verordnung der europäischen
Gemeinschaft über den Import von Karamelbonbons exakt 25911 Wörter.

Attachment: pgpcE0xF5dZ2U.pgp
Description: PGP signature

Reply via email to