On 07/02/13 15:06, bes wrote: > Hi, > > i found some bug in wget with interpreting and save percent-encoding 3 byte > utf8 url > > example: > 1. Create url with "—". This is U+2014 (EM DASH). Percent-encoding UTF-8 is > "%E2%80%94" > 2. Try wget it: wget "http://example.com/abc—d" or wget " > http://example.com/abc%E2%80%94d" directly > 3. Wget save this URL to file "abc\342%80%94d". Expected is > "abc%E2%80%94d". This is a bug.
The problem is that it checks if it's a printable character in latin1. There is a bug at https://savannah.gnu.org/bugs/index.php?37564 An option would be to use --restrict-file-names=nocontrol to get the em dash in the filename, instead of the percent-encoded version.
