Hello, I think I've found a bug with wget.
I originally came across this problem when recursively downloading folders that were presented by nginx's fancy-index module. Sometimes a filename would include a "’" [RIGHT SINGLE QUOTATION MARK (U+2019)] and wget would always get a 404 error when downloading the file. Downloading this simple html file (simplified output of nginx fancy-index) shows the error: <!DOCTYPE html><html><head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <title>RIGHT SINGLE QUOTE TEST</title> </head><body> <a href="%E2%80%99">test</a> </body></html> Full command line (Windows cmd.exe) wget -d --no-verbose --tries 0 --continue --show-progress --wait 0.1 --waitretry 5 -e robots=off --rejected-log=rejected.log --recursive --level inf --reject "index.html*,jpg,png,zip" --no-parent --no-host-directories --auth-no-challenge --user xxx --password xxx -P output_dir https://mydomain.com/test/ Debug Output: DEBUG output created by Wget 1.20.3 on mingw32. Reading HSTS entries from C:\ProgramData\chocolatey\lib\Wget\tools/.wget-hsts URI encoding = 'CP1252' iconv UTF-8 -> CP1252 iconv outlen=60 inlen=30 converted 'https://mydomain.com/test/' (CP1252) -> ' https://mydomain.com/test/' (UTF-8) URI encoding = 'CP1252' Enqueuing https://mydomain.com/test/ at depth 0 Queue count 1, maxcount 1. [IRI Enqueuing 'https://mydomain.com/test/' with 'CP1252' Dequeuing https://mydomain.com/test/ at depth 0 Queue count 0, maxcount 1. iconv UTF-8 -> CP1252 iconv outlen=60 inlen=30 converted 'https://mydomain.com/test/' (CP1252) -> ' https://mydomain.com/test/' (UTF-8) Converted file name 'test/index.html' (UTF-8) -> 'test/index.html' (CP1252) Auth-without-challenge set, sending Basic credentials. seconds 0.00, Caching mydomain.com => my.ip.add.ress seconds 0.00, Created socket 4. Releasing 0x0000000000b3bf60 (new refcount 1). Initiating SSL handshake. seconds 900.00, Winsock error: 0 Handshake successful; connected socket 4 to SSL handle 0x0000000000b52260 certificate: subject: CN=mydomain.com issuer: CN=Let's Encrypt Authority X3,O=Let's Encrypt,C=US X509 certificate successfully verified and matches host mydomain.com ---request begin--- GET /test/ HTTP/1.1 User-Agent: Wget/1.20.3 (mingw32) Accept: */* Accept-Encoding: identity Authorization: Basic ******** Host: mydomain.com Connection: Keep-Alive ---request end--- seconds 900.00, Winsock error: 0 ---response begin--- HTTP/1.1 200 OK Server: nginx/1.14.1 Date: Fri, 11 Oct 2019 02:17:57 GMT Content-Type: text/html Content-Length: 185 Last-Modified: Fri, 11 Oct 2019 02:17:52 GMT Connection: keep-alive Keep-Alive: timeout=20 ETag: "5d9fe650-b9" Accept-Ranges: bytes ---response end--- Registered socket 4 for persistent reuse. seconds 900.00, Winsock error: 0 0K 100% 282K=0 .001s2019-10-10 19:17:01 URL:https://mydomain.com/test/ [185/185] -> "E:/test/poops/test/index.html.tmp" [1] Loaded E:/test/poops/test/index.html.tmp (size 185). URI encoding = 'CP1252' E:/test/poops/test/index.html.tmp: merge('https://mydomain.com/test/', '%E2%80%99') -> https://mydomain.com/test/%E2%80%99 iconv UTF-8 -> CP1252 iconv outlen=66 inlen=33 converted 'https://mydomain.com/test/%E2%80%99' (CP1252) -> ' https://mydomain.com/test/’' (UTF-8) appending 'https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2' to urlpos. URI content encoding = 'utf-8' no-follow in E:/test/poops/test/index.html.tmp: 0 Deciding whether to enqueue " https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2". Decided to load it. URI encoding = 'utf-8' Enqueuing https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2 at depth 1 Queue count 1, maxcount 1. [IRI Enqueuing 'https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2' with 'utf-8' Removing file due to recursive rejection criteria in recursive_retrieve(): Dequeuing https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2 at depth 1 Queue count 0, maxcount 1. Converted file name 'test/’' (UTF-8) -> 'test/’' (CP1252) Auth-without-challenge set, sending Basic credentials. Reusing fd 4. ---request begin--- GET /test/%C3%A2%E2%82%AC%E2%84%A2 HTTP/1.1 Referer: https://mydomain.com/test/ User-Agent: Wget/1.20.3 (mingw32) Accept: */* Accept-Encoding: identity Authorization: Basic ******** Host: mydomain.com Connection: Keep-Alive ---request end--- seconds 900.00, Winsock error: 0 ---response begin--- HTTP/1.1 404 Not Found Server: nginx/1.14.1 Date: Fri, 11 Oct 2019 02:17:58 GMT Content-Type: text/html Content-Length: 169 Connection: keep-alive Keep-Alive: timeout=20 ---response end--- Skipping 169 bytes of body: [seconds 900.00, Winsock error: 0 <html> <head><title>404 Not Found</title></head> <body bgcolor="white"> <center><h1>404 Not Found</h1></center> <hr><center>nginx/1.14.1</center> </body> </html> ] done. https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2: 2019-10-10 19:17:02 ERROR 404: Not Found. FINISHED --2019-10-10 19:17:02-- Total wall clock time: 1.6s Downloaded: 1 files, 185 in 0.001s (282 KB/s) The error is pretty clearly an encoding conversion issue, going from UTF-8, assumed to be CP1252, converting into UTF-8, which becomes wrong. This is nicely described at the end of this page: http://www.anchor.com.au/hosting/Character-sets-and-content-encoding-hell What's not clear to me is if this is definitely a bug with wget or not since there are a couple other systems involved (nginx, Windows). I'm inclined however to think it is wget because I can download the file with Chrome just fine. I've also had trouble downloading the url with the single quote directly. However that could be a problem with cmd.exe and how it encodes & passes strings to wget. So maybe that would be a related bug? Hope this is a real bug! Cheers, - Cameron