Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-05 Thread Hrvoje Niksic
D Richard Felker III [EMAIL PROTECTED] writes:

 The request log shows that the slashes are apparently respected.

 I retried a test case and found the same thing -- the slashes were
 respected.

OK.

 Then I remembered that I was using -i. Wget seems to work fine with
 the url on the command line; the bug only happens when the url is
 passed in with:

 cat EOF | wget -i -
 http://...
 EOF

But I cannot repeat that, either.  As long as the consecutive slashes
are in the query string, they're not stripped.

 Using this method is necessary since it is the ONLY secure way I
 know of to do a password-protected http request from a shell script.

Yes, that is the best way to do it.



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-04 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote:
   Removing the offending code fixes the problem, but I'm not sure if
   this is the correct solution. I expect it would be more correct to
   remove multiple slashes only before the first occurrance of ?, but
   not afterwards.
  
  That's exactly what should happen.  Please give us more details, if
  possible accompanied by `-d' output.
 
  If you'd still like details now that you know the version I was
  using, let me know and I'll be happy to do some tests.
 
 Yes please.  For example, this is how it works for me:
 
 $ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com;
 DEBUG output created by Wget 1.8.2 on linux-gnu.
 
 --19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
= `something?redirect=http:%2F%2Fwww.cnn.com'
 Resolving www.xemacs.org... done.
 Caching www.xemacs.org = 199.184.165.136
 Connecting to www.xemacs.org[199.184.165.136]:80... connected.
 Created socket 3.
 Releasing 0x8080b40 (new refcount 1).
 ---request begin---
 GET /something?redirect=http://www.cnn.com HTTP/1.0
 User-Agent: Wget/1.8.2
 Host: www.xemacs.org
 Accept: */*
 Connection: Keep-Alive
 
 ---request end---
 HTTP request sent, awaiting response...
 ...
 
 The request log shows that the slashes are apparently respected.

I retried a test case and found the same thing -- the slashes were
respected. Then I remembered that I was using -i. Wget seems to work
fine with the url on the command line; the bug only happens when the
url is passed in with:

cat EOF | wget -i -
http://...
EOF

Using this method is necessary since it is the ONLY secure way I know
of to do a password-protected http request from a shell script.
Otherwise the password appears on the command line...

Rich



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread Hrvoje Niksic
D Richard Felker III [EMAIL PROTECTED] writes:

 The following code in url.c makes it impossible to request urls that
 contain multiple slashes in a row in their query string:
[...]

That code is removed in CVS, so multiple slashes now work correctly.

 Think of something like http://foo/bar/redirect.cgi?http://...
 wget translates this into: [...]

Which version of Wget are you using?  I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.

 Removing the offending code fixes the problem, but I'm not sure if
 this is the correct solution. I expect it would be more correct to
 remove multiple slashes only before the first occurrance of ?, but
 not afterwards.

That's exactly what should happen.  Please give us more details, if
possible accompanied by `-d' output.



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote:
 D Richard Felker III [EMAIL PROTECTED] writes:
 
  The following code in url.c makes it impossible to request urls that
  contain multiple slashes in a row in their query string:
 [...]
 
 That code is removed in CVS, so multiple slashes now work correctly.
 
  Think of something like http://foo/bar/redirect.cgi?http://...
  wget translates this into: [...]
 
 Which version of Wget are you using?  I think even Wget 1.8.2 didn't
 collapse multiple slashes in query strings, only in paths.

I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and
it persisted.

  Removing the offending code fixes the problem, but I'm not sure if
  this is the correct solution. I expect it would be more correct to
  remove multiple slashes only before the first occurrance of ?, but
  not afterwards.
 
 That's exactly what should happen.  Please give us more details, if
 possible accompanied by `-d' output.

If you'd still like details now that you know the version I was using,
let me know and I'll be happy to do some tests.

Rich



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread Hrvoje Niksic
D Richard Felker III [EMAIL PROTECTED] writes:

  Think of something like http://foo/bar/redirect.cgi?http://...
  wget translates this into: [...]
 
 Which version of Wget are you using?  I think even Wget 1.8.2 didn't
 collapse multiple slashes in query strings, only in paths.

 I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1
 and it persisted.

OK.

  Removing the offending code fixes the problem, but I'm not sure if
  this is the correct solution. I expect it would be more correct to
  remove multiple slashes only before the first occurrance of ?, but
  not afterwards.
 
 That's exactly what should happen.  Please give us more details, if
 possible accompanied by `-d' output.

 If you'd still like details now that you know the version I was
 using, let me know and I'll be happy to do some tests.

Yes please.  For example, this is how it works for me:

$ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com;
DEBUG output created by Wget 1.8.2 on linux-gnu.

--19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
   = `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org = 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
...

The request log shows that the slashes are apparently respected.



Bug in wget: cannot request urls with double-slash in the query string

2004-02-29 Thread D Richard Felker III
The following code in url.c makes it impossible to request urls that
contain multiple slashes in a row in their query string:

else if (*h == '/')
{
  /* Ignore empty path elements.  Supporting them well is hard
 (where do you save http://x.com///y.html;?), and they
 don't bring any practical gain.  Plus, they break our
 filesystem-influenced assumptions: allowing them would
 make x/y//../z simplify to x/y/z, whereas most people
 would expect x/z.  */
  ++h;
}

Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into:

http://foo/bar/redirect.cgi?http:/...

and then the web server of course gives an error. Note that the
problem occurs even if the slashes were url escaped, since wget
unescapes them.

Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but not
afterwards.

Rich