URL:
<https://savannah.gnu.org/bugs/?60287>
Summary: Windows recursive download escapes utf8 URLs twice
Project: GNU Wget
Submitted by: cinderblock
Submitted on: Thu 25 Mar 2021 09:09:42 AM UTC
Category: None
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Release: 1.20
Discussion Lock: Any
Operating System: Microsoft Windows
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: No
_______________________________________________________
Details:
Steps to reproduce:
1. On a web-server, create an html file with the contents:
<a href="space-ok%20cyrillic-not%D0%B3.txt">target-with-other-char</a>
2. Download that file recursively: `wget -r
http://example.com/wget-test.html`
On Linux, we get the expected (truncated) result:
...
2021-03-25 02:01:59 (4.51 MB/s) - ‘example.com/wget-test.html’ saved [71]
--2021-03-25 02:01:59-- http://example.com/space-ok%20cyrillic-not%D0%B3.txt
...
However on Windows, the urlencoded utf8 character is mangled and fails to
download.
...
2021-03-25 02:02:29 (4.51 MB/s) - ‘example.com/wget-test.html’ saved [71]
--2021-03-25 02:02:29--
http://example.com/space-ok%20cyrillic-not%C3%90%C2%B3.txt
...
Note that the space (%20) is not mangled.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60287>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/