Hi,

I've just found a case where wget 1.16.3 responds to a 302 redirect
differently depending on whether it's in an ASCII or UTF-8 locale.

This works:
LC_ALL=en_GB.UTF-8 wget 
https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2

This doesn't work:
LC_ALL=C wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2

I've attached logs with -d showing what's actually going on. The
initial request gives a 302 response with a Location: that contains:
  ....tar.bz2?Signature=up6%2BtTpSF...

In the UTF-8 locale, wget correctly redirects to that location.

In the ASCII locale, wget -d print a "converted: '...' -> '...'" line
(from iri.c's do_conversion), then redirects to:
  ....tar.bz2?Signature=up6+tTpSF...

(If you try it yourself you'll get a slightly different URL, but at
least for me it usually contains %2B somewhere.)

This appears to be because do_conversion calls url_unescape on the
input string it's given -- even though that input string is a _const_
char * in the code that calls it (main -> retrieve_url -> url_parse ->
remote_to_utf8 -> do_conversion). It's not immediately obvious to me
whether that's intentional or not; at the very least, it's a surprising
bit of behaviour.

Thanks,

-- 
Adam Sampson <[email protected]>                         <http://offog.org/>
DEBUG output created by Wget 1.16.3 on linux-gnu.

URI encoding = 'ANSI_X3.4-1968'
converted 'https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2' 
(ANSI_X3.4-1968) -> 
'https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2' (UTF-8)
--2015-03-13 22:03:25--  
https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2
Certificates loaded: 174
Resolving bitbucket.org (bitbucket.org)... 131.103.20.168, 131.103.20.167
Caching bitbucket.org => 131.103.20.168 131.103.20.167
Connecting to bitbucket.org (bitbucket.org)|131.103.20.168|:443... connected.
Created socket 4.
Releasing 0x00007f8ec60bc5f0 (new refcount 1).

---request begin---
GET /pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2 HTTP/1.1
User-Agent: Wget/1.16.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: bitbucket.org
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 FOUND
Server: nginx/1.6.2
Date: Fri, 13 Mar 2015 22:03:25 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Served-By: app22
X-Render-Time: 0.0194919109344
Content-Language: en
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Location: 
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
Cache-Control: max-age=0
X-Request-Count: 218
Expires: Fri, 13 Mar 2015 22:03:25 GMT
Vary: Accept-Language, Cookie
X-Frame-Options: SAMEORIGIN
Last-Modified: Fri, 13 Mar 2015 22:03:25 GMT
X-Static-Version: c524695e7f84
X-Version: c524695e7f84
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Cache-Status: MISS

---response end---
302 FOUND
Registered socket 4 for persistent reuse.
URI content encoding = 'utf-8'
Location: 
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
 [following]
] done.
URI content encoding = None
converted 
'https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22'
 (ANSI_X3.4-1968) -> 
'https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;
 filename="pypy-2.5.0-src.tar.bz2"' (UTF-8)
--2015-03-13 22:03:25--  
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;%20filename=%22pypy-2.5.0-src.tar.bz2%22
Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 
54.231.2.89
Caching bbuseruploads.s3.amazonaws.com => 54.231.2.89
Connecting to bbuseruploads.s3.amazonaws.com 
(bbuseruploads.s3.amazonaws.com)|54.231.2.89|:443... connected.
Created socket 5.
Releasing 0x00007f8ec6577ea0 (new refcount 1).

---request begin---
GET 
/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;%20filename=%22pypy-2.5.0-src.tar.bz2%22
 HTTP/1.1
User-Agent: Wget/1.16.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: bbuseruploads.s3.amazonaws.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 403 Forbidden
x-amz-request-id: 2F8F9BE6D8E16461
x-amz-id-2: 
2FCFrySY8ND/Fre+9C2iP42xNSucboxUnTK35Ycxiroa4YHfqCNX8z1jYgS7dZcXTmm+b1Kj0fs=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 13 Mar 2015 22:03:25 GMT
Server: AmazonS3

---response end---
403 Forbidden
Disabling further reuse of socket 4.
Registered socket 5 for persistent reuse.
Skipping 512 bytes of body: [<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we 
calculatSkipping 512 bytes of body: [ed does not match the signature you 
provided. Check your key and signing 
method.</Message><AWSAccessKeyId>0EMWEFSGA12Z1HF1TZ82</AWSAccessKeyId><StringToSign>GET


1426285236
/bbuseruploads/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?response-content-disposition=attachment;
 filename="pypy-2.5.0-src.tar.bz2"</StringToSign><SignatureProvided>up6 
tTpSFVJ2E/0sD6JvfNQkeJg=</SignatureProvided><StringToSignBytes>47 45 54 0a 0a 
0a 31 34 32 36 32 38 35 32 33 36 0a 2f 62 62 75 73 65 72 75 70 6c 6f 61 64 73 
2f 70 79 70 Skipping 501 bytes of body: [79 2f 70 79 70 79 2f 64 6f 77 6e 6c 6f 
61 64 73 2f 70 79 70 79 2d 32 2e 35 2e 30 2d 73 72 63 2e 74 61 72 2e 62 7a 32 
3f 72 65 73 70 6f 6e 73 65 2d 63 6f 6e 74 65 6e 74 2d 64 69 73 70 6f 73 69 74 
69 6f 6e 3d 61 74 74 61 63 68 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 
70 79 70 79 2d 32 2e 35 2e 30 2d 73 72 63 2e 74 61 72 2e 62 7a 32 
22</StringToSignBytes><RequestId>2F8F9BE6D8E16461</RequestId><HostId>2FCFrySY8ND/Fre+9C2iP42xNSucboxUnTK35Ycxiroa4YHfqCNX8z1jYgS7dZcXTmm+b1Kj0fs=</HostId></Error>]
 done.
2015-03-13 22:03:26 ERROR 403: Forbidden.

[IRI fallbacking to non-utf8 for 
'https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;%20filename=%22pypy-2.5.0-src.tar.bz2%22'
--2015-03-13 22:03:26--  
https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2
Found bitbucket.org in host_name_addresses_map (0x7f8ec60bc5f0)
Connecting to bitbucket.org (bitbucket.org)|131.103.20.168|:443... connected.
Created socket 4.
Releasing 0x00007f8ec60bc5f0 (new refcount 1).

---request begin---
GET /pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2 HTTP/1.1
User-Agent: Wget/1.16.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: bitbucket.org
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 FOUND
Server: nginx/1.6.2
Date: Fri, 13 Mar 2015 22:03:26 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Served-By: app22
X-Render-Time: 0.022087097168
Content-Language: en
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Location: 
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
Cache-Control: max-age=0
X-Request-Count: 211
Expires: Fri, 13 Mar 2015 22:03:26 GMT
Vary: Accept-Language, Cookie
X-Frame-Options: SAMEORIGIN
Last-Modified: Fri, 13 Mar 2015 22:03:26 GMT
X-Static-Version: c524695e7f84
X-Version: c524695e7f84
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Cache-Status: MISS

---response end---
302 FOUND
Disabling further reuse of socket 5.
Registered socket 4 for persistent reuse.
URI content encoding = 'utf-8'
Location: 
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
 [following]
] done.
URI content encoding = None
converted 
'https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22'
 (ANSI_X3.4-1968) -> 
'https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;
 filename="pypy-2.5.0-src.tar.bz2"' (UTF-8)
--2015-03-13 22:03:26--  
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;%20filename=%22pypy-2.5.0-src.tar.bz2%22
Found bbuseruploads.s3.amazonaws.com in host_name_addresses_map (0x7f8ec6577ea0)
Connecting to bbuseruploads.s3.amazonaws.com 
(bbuseruploads.s3.amazonaws.com)|54.231.2.89|:443... connected.
Created socket 5.
Releasing 0x00007f8ec6577ea0 (new refcount 1).

---request begin---
GET 
/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;%20filename=%22pypy-2.5.0-src.tar.bz2%22
 HTTP/1.1
User-Agent: Wget/1.16.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: bbuseruploads.s3.amazonaws.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 403 Forbidden
x-amz-request-id: E8F8F2BC7169654E
x-amz-id-2: 
RMcOOaENfcLunyhQx5oGD4/IrDJZP9/wVWR4mh65rUModk9xos++n+1mJdv+NIAUafoi2keHwEU=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 13 Mar 2015 22:03:27 GMT
Server: AmazonS3

---response end---
403 Forbidden
Disabling further reuse of socket 4.
Registered socket 5 for persistent reuse.
Skipping 512 bytes of body: [<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we 
calculatSkipping 512 bytes of body: [ed does not match the signature you 
provided. Check your key and signing 
method.</Message><AWSAccessKeyId>0EMWEFSGA12Z1HF1TZ82</AWSAccessKeyId><StringToSign>GET


1426285236
/bbuseruploads/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?response-content-disposition=attachment;
 filename="pypy-2.5.0-src.tar.bz2"</StringToSign><SignatureProvided>up6 
tTpSFVJ2E/0sD6JvfNQkeJg=</SignatureProvided><StringToSignBytes>47 45 54 0a 0a 
0a 31 34 32 36 32 38 35 32 33 36 0a 2f 62 62 75 73 65 72 75 70 6c 6f 61 64 73 
2f 70 79 70 Skipping 501 bytes of body: [79 2f 70 79 70 79 2f 64 6f 77 6e 6c 6f 
61 64 73 2f 70 79 70 79 2d 32 2e 35 2e 30 2d 73 72 63 2e 74 61 72 2e 62 7a 32 
3f 72 65 73 70 6f 6e 73 65 2d 63 6f 6e 74 65 6e 74 2d 64 69 73 70 6f 73 69 74 
69 6f 6e 3d 61 74 74 61 63 68 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 
70 79 70 79 2d 32 2e 35 2e 30 2d 73 72 63 2e 74 61 72 2e 62 7a 32 
22</StringToSignBytes><RequestId>E8F8F2BC7169654E</RequestId><HostId>RMcOOaENfcLunyhQx5oGD4/IrDJZP9/wVWR4mh65rUModk9xos++n+1mJdv+NIAUafoi2keHwEU=</HostId></Error>]
 done.
2015-03-13 22:03:27 ERROR 403: Forbidden.

[IRI fallbacking to non-utf8 for 
'https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6+tTpSFVJ2E/0sD6JvfNQkeJg=&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment;%20filename=%22pypy-2.5.0-src.tar.bz2%22'
--2015-03-13 22:03:27--  
https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2
Found bitbucket.org in host_name_addresses_map (0x7f8ec60bc5f0)
Connecting to bitbucket.org (bitbucket.org)|131.103.20.168|:443... 
DEBUG output created by Wget 1.16.3 on linux-gnu.

URI encoding = ‘UTF-8’
--2015-03-13 22:03:38--  
https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2
Certificates loaded: 174
Resolving bitbucket.org (bitbucket.org)... 131.103.20.167, 131.103.20.168
Caching bitbucket.org => 131.103.20.167 131.103.20.168
Connecting to bitbucket.org (bitbucket.org)|131.103.20.167|:443... connected.
Created socket 4.
Releasing 0x00007fb8b9e6a900 (new refcount 1).

---request begin---
GET /pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2 HTTP/1.1
User-Agent: Wget/1.16.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: bitbucket.org
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 FOUND
Server: nginx/1.6.2
Date: Fri, 13 Mar 2015 22:03:39 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Served-By: app19
X-Render-Time: 0.0209329128265
Content-Language: en
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Location: 
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
Cache-Control: max-age=0
X-Request-Count: 31
Expires: Fri, 13 Mar 2015 22:03:39 GMT
Vary: Accept-Language, Cookie
X-Frame-Options: SAMEORIGIN
Last-Modified: Fri, 13 Mar 2015 22:03:39 GMT
X-Static-Version: c524695e7f84
X-Version: c524695e7f84
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Cache-Status: MISS

---response end---
302 FOUND
Registered socket 4 for persistent reuse.
URI content encoding = ‘utf-8’
Location: 
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
 [following]
] done.
URI content encoding = None
--2015-03-13 22:03:39--  
https://bbuseruploads.s3.amazonaws.com/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 
54.231.2.89
Caching bbuseruploads.s3.amazonaws.com => 54.231.2.89
Connecting to bbuseruploads.s3.amazonaws.com 
(bbuseruploads.s3.amazonaws.com)|54.231.2.89|:443... connected.
Created socket 5.
Releasing 0x00007fb8b9e6d530 (new refcount 1).

---request begin---
GET 
/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2?Signature=up6%2BtTpSFVJ2E%2F0sD6JvfNQkeJg%3D&Expires=1426285236&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82&response-content-disposition=attachment%3B%20filename%3D%22pypy-2.5.0-src.tar.bz2%22
 HTTP/1.1
User-Agent: Wget/1.16.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: bbuseruploads.s3.amazonaws.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK
x-amz-id-2: 
q54CUOz21ri4jTkV7Mpvzoxd30OeBckCsNEOWQUSTb269KeDACTkuHroCrIv5iiNvMqTr+4wjMo=
x-amz-request-id: 6771AAB12CFE9B28
Date: Fri, 13 Mar 2015 22:03:40 GMT
Content-Disposition: attachment; filename="pypy-2.5.0-src.tar.bz2"
Last-Modified: Tue, 03 Feb 2015 11:28:45 GMT
ETag: "f4700c0af45e986178b36ce91a45136e"
Accept-Ranges: bytes
Content-Type: application/x-tar
Content-Length: 15065106
Server: AmazonS3

---response end---
200 OK
Disabling further reuse of socket 4.
Registered socket 5 for persistent reuse.
Length: 15065106 (14M) [application/x-tar]
Saving to: ‘pypy-2.5.0-src.tar.bz2.1’

     0K .......... .......... .......... .......... ..........  0%  213K 69s
    50K .......... .......... .......... .......... ..........  0%  423K 52s
   100K .......... .......... .......... .......... ..........  1%  433K 45s
(etc. etc.)

Reply via email to