Package: wget
Version: 1.10.2-1
Severity: normal

When I use wget -r -np http://example.com/foo/bar/ it will only
download stuff in "bar" in below, but if I strip away the slash at the
end, it will also download everything in "foo".  I think this is
against what users expect - wget should look at the final location of
the URL given by the user (i.e. .../foo/bar/index.html and then take
the last dir).


Example:

# Without slash at the end it downloads too much

669:[EMAIL PROTECTED]: ~/tmp/src] wget -r -np 
http://www.cyrius.com/test/wget/foo
--11:03:52--  http://www.cyrius.com/test/wget/foo
           => `www.cyrius.com/test/wget/foo'
Resolving www.cyrius.com... 65.19.161.204
Connecting to www.cyrius.com|65.19.161.204|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.cyrius.com/test/wget/foo/ [following]
--11:03:52--  http://www.cyrius.com/test/wget/foo/
           => `www.cyrius.com/test/wget/foo/index.html'
Connecting to www.cyrius.com|65.19.161.204|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 140 [text/html]

100%[==========================================================>] 140           
--.--K/s

11:03:52 (7.85 MB/s) - `www.cyrius.com/test/wget/foo/index.html' saved [140/140]

Loading robots.txt; please ignore errors.
--11:03:52--  http://www.cyrius.com/robots.txt
           => `www.cyrius.com/robots.txt'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 72 [text/plain]

100%[==========================================================>] 72            
--.--K/s

11:03:53 (6.87 MB/s) - `www.cyrius.com/robots.txt' saved [72/72]

--11:03:53--  http://www.cyrius.com/test/wget/index.html
           => `www.cyrius.com/test/wget/index.html'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 157 [text/html]

100%[==========================================================>] 157           
--.--K/s

11:03:53 (16.64 MB/s) - `www.cyrius.com/test/wget/index.html' saved [157/157]

--11:03:53--  http://www.cyrius.com/test/wget/foo/index2.html
           => `www.cyrius.com/test/wget/foo/index2.html'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 67 [text/html]

100%[==========================================================>] 67            
--.--K/s

11:03:53 (4.26 MB/s) - `www.cyrius.com/test/wget/foo/index2.html' saved [67/67]

--11:03:53--  http://www.cyrius.com/test/wget/foo/index.html
           => `www.cyrius.com/test/wget/foo/index.html'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 140 [text/html]

100%[==========================================================>] 140           
--.--K/s

11:03:53 (14.83 MB/s) - `www.cyrius.com/test/wget/foo/index.html' saved 
[140/140]

--11:03:53--  http://www.cyrius.com/test/wget/bar/index.html
           => `www.cyrius.com/test/wget/bar/index.html'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 67 [text/html]

100%[==========================================================>] 67            
--.--K/s

11:03:53 (9.13 MB/s) - `www.cyrius.com/test/wget/bar/index.html' saved [67/67]


# With slash at the end it works

FINISHED --11:03:53--
Downloaded: 643 bytes in 6 files
670:[EMAIL PROTECTED]: ~/tmp/src] rm -rf www.cyrius.com
671:[EMAIL PROTECTED]: ~/tmp/src] wget -r -np 
http://www.cyrius.com/test/wget/foo/
--11:04:12--  http://www.cyrius.com/test/wget/foo/
           => `www.cyrius.com/test/wget/foo/index.html'
Resolving www.cyrius.com... 65.19.161.204
Connecting to www.cyrius.com|65.19.161.204|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 140 [text/html]

100%[==========================================================>] 140           
--.--K/s

11:04:13 (13.35 MB/s) - `www.cyrius.com/test/wget/foo/index.html' saved 
[140/140]

Loading robots.txt; please ignore errors.
--11:04:13--  http://www.cyrius.com/robots.txt
           => `www.cyrius.com/robots.txt'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 72 [text/plain]

100%[==========================================================>] 72            
--.--K/s

11:04:13 (8.58 MB/s) - `www.cyrius.com/robots.txt' saved [72/72]

--11:04:13--  http://www.cyrius.com/test/wget/foo/index2.html
           => `www.cyrius.com/test/wget/foo/index2.html'
Reusing existing connection to www.cyrius.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 67 [text/html]

100%[==========================================================>] 67            
--.--K/s

11:04:13 (2.56 MB/s) - `www.cyrius.com/test/wget/foo/index2.html' saved [67/67]


FINISHED --11:04:13--
Downloaded: 279 bytes in 3 files


# Let's look at the location...

675:[EMAIL PROTECTED]: ~/tmp/src] telnet sorrow.cyrius.com 80
Trying 65.19.161.204...
Connected to sorrow.cyrius.com.
Escape character is '^]'.
GET /test/wget/foo HTTP/1.1
Host: www.cyrius.com

HTTP/1.1 301 Moved Permanently
Date: Sat, 05 Nov 2005 11:06:26 GMT
Server: Apache/1.3.33 (Debian GNU/Linux)
Location: http://www.cyrius.com/test/wget/foo/
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

137
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A 
HREF="http://www.cyrius.com/test/wget/foo/";>here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.33 Server at www.cyrius.com Port 80</ADDRESS>
</BODY></HTML>


It says "Location: http://www.cyrius.com/test/wget/foo/"; so -np should take
foo/ as path and not wget/




-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.12-1-686
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages wget depends on:
ii  libc6                         2.3.5-7    GNU C Library: Shared libraries an
ii  libssl0.9.8                   0.9.8a-2   SSL shared libraries

wget recommends no packages.

-- no debconf information

-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to