Package: wget Version: 1.10.2-1 Severity: normal When I use wget -r -np http://example.com/foo/bar/ it will only download stuff in "bar" in below, but if I strip away the slash at the end, it will also download everything in "foo". I think this is against what users expect - wget should look at the final location of the URL given by the user (i.e. .../foo/bar/index.html and then take the last dir).
Example: # Without slash at the end it downloads too much 669:[EMAIL PROTECTED]: ~/tmp/src] wget -r -np http://www.cyrius.com/test/wget/foo --11:03:52-- http://www.cyrius.com/test/wget/foo => `www.cyrius.com/test/wget/foo' Resolving www.cyrius.com... 65.19.161.204 Connecting to www.cyrius.com|65.19.161.204|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://www.cyrius.com/test/wget/foo/ [following] --11:03:52-- http://www.cyrius.com/test/wget/foo/ => `www.cyrius.com/test/wget/foo/index.html' Connecting to www.cyrius.com|65.19.161.204|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 140 [text/html] 100%[==========================================================>] 140 --.--K/s 11:03:52 (7.85 MB/s) - `www.cyrius.com/test/wget/foo/index.html' saved [140/140] Loading robots.txt; please ignore errors. --11:03:52-- http://www.cyrius.com/robots.txt => `www.cyrius.com/robots.txt' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 72 [text/plain] 100%[==========================================================>] 72 --.--K/s 11:03:53 (6.87 MB/s) - `www.cyrius.com/robots.txt' saved [72/72] --11:03:53-- http://www.cyrius.com/test/wget/index.html => `www.cyrius.com/test/wget/index.html' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 157 [text/html] 100%[==========================================================>] 157 --.--K/s 11:03:53 (16.64 MB/s) - `www.cyrius.com/test/wget/index.html' saved [157/157] --11:03:53-- http://www.cyrius.com/test/wget/foo/index2.html => `www.cyrius.com/test/wget/foo/index2.html' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 67 [text/html] 100%[==========================================================>] 67 --.--K/s 11:03:53 (4.26 MB/s) - `www.cyrius.com/test/wget/foo/index2.html' saved [67/67] --11:03:53-- http://www.cyrius.com/test/wget/foo/index.html => `www.cyrius.com/test/wget/foo/index.html' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 140 [text/html] 100%[==========================================================>] 140 --.--K/s 11:03:53 (14.83 MB/s) - `www.cyrius.com/test/wget/foo/index.html' saved [140/140] --11:03:53-- http://www.cyrius.com/test/wget/bar/index.html => `www.cyrius.com/test/wget/bar/index.html' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 67 [text/html] 100%[==========================================================>] 67 --.--K/s 11:03:53 (9.13 MB/s) - `www.cyrius.com/test/wget/bar/index.html' saved [67/67] # With slash at the end it works FINISHED --11:03:53-- Downloaded: 643 bytes in 6 files 670:[EMAIL PROTECTED]: ~/tmp/src] rm -rf www.cyrius.com 671:[EMAIL PROTECTED]: ~/tmp/src] wget -r -np http://www.cyrius.com/test/wget/foo/ --11:04:12-- http://www.cyrius.com/test/wget/foo/ => `www.cyrius.com/test/wget/foo/index.html' Resolving www.cyrius.com... 65.19.161.204 Connecting to www.cyrius.com|65.19.161.204|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 140 [text/html] 100%[==========================================================>] 140 --.--K/s 11:04:13 (13.35 MB/s) - `www.cyrius.com/test/wget/foo/index.html' saved [140/140] Loading robots.txt; please ignore errors. --11:04:13-- http://www.cyrius.com/robots.txt => `www.cyrius.com/robots.txt' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 72 [text/plain] 100%[==========================================================>] 72 --.--K/s 11:04:13 (8.58 MB/s) - `www.cyrius.com/robots.txt' saved [72/72] --11:04:13-- http://www.cyrius.com/test/wget/foo/index2.html => `www.cyrius.com/test/wget/foo/index2.html' Reusing existing connection to www.cyrius.com:80. HTTP request sent, awaiting response... 200 OK Length: 67 [text/html] 100%[==========================================================>] 67 --.--K/s 11:04:13 (2.56 MB/s) - `www.cyrius.com/test/wget/foo/index2.html' saved [67/67] FINISHED --11:04:13-- Downloaded: 279 bytes in 3 files # Let's look at the location... 675:[EMAIL PROTECTED]: ~/tmp/src] telnet sorrow.cyrius.com 80 Trying 65.19.161.204... Connected to sorrow.cyrius.com. Escape character is '^]'. GET /test/wget/foo HTTP/1.1 Host: www.cyrius.com HTTP/1.1 301 Moved Permanently Date: Sat, 05 Nov 2005 11:06:26 GMT Server: Apache/1.3.33 (Debian GNU/Linux) Location: http://www.cyrius.com/test/wget/foo/ Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1 137 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>301 Moved Permanently</TITLE> </HEAD><BODY> <H1>Moved Permanently</H1> The document has moved <A HREF="http://www.cyrius.com/test/wget/foo/">here</A>.<P> <HR> <ADDRESS>Apache/1.3.33 Server at www.cyrius.com Port 80</ADDRESS> </BODY></HTML> It says "Location: http://www.cyrius.com/test/wget/foo/" so -np should take foo/ as path and not wget/ -- System Information: Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.12-1-686 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Versions of packages wget depends on: ii libc6 2.3.5-7 GNU C Library: Shared libraries an ii libssl0.9.8 0.9.8a-2 SSL shared libraries wget recommends no packages. -- no debconf information -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

