Hello there, Retrieving a directory (or some `clean' URL) without a slash at the end of a URL -- e.g. example.com/foo -- web servers often add an end-slash by a redirect example.com/foo -> example.com/foo/. I'll hereafter call such redirects `trivial'.
The problem is that some websites (e.g. ocw.mit.edu) use links without end-slash. This means that when Wget (with -r) retrieves example.com/foo, it'll save the content to the file `foo' regardless of the redirect. Then when Wget reads `foo' and sees a link to example.com/foo/file.bar, it'll delete a regular file `foo' and create a directory with the same name (by the function mkalldirs(), see url.c:1220). Therefore we lose the entire page. Example of reproducer (GNU Wget 1.14.97-1221): $ wget -d -r --no-parent http://ocw.mit.edu/courses/mathematics/18-100b-analysis-i-fall-2010/ 2>&1 | grep "directory danger" Removing ocw.mit.edu/courses/<skipped>/assignments because of directory danger! Removing ocw.mit.edu/courses/<skipped>/readings-notes because of directory danger! Removing ocw.mit.edu/courses/<skipped>/study-materials because of directory danger! --trust-server-names solves this problem, but it seems to be not obvious for a user to use it every time together with -r, to say nothing of security reasons. Does it sound reasonable to handle such `trivial' redirects (that simply add an end-slash) as a special case regardless of `trust-server-names'? Thanks -- Maxim Kuznetsov
