Hi there, I'm afraid I cannot reproduce it in the latest git snapshot.
The resulting link is exactly the same in the website (online) and in the downloaded content: http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT% 2FXLATB.html vs file:///home/aja/codebase/wget/www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html When opening 'reference.html' on my browser and clicking on the link, it's true that the browser itself converts it from %2F to %252F, but I didn't get any 404 in any case. What's more, if the downloaded content looks exactly the same as the online one, I don't think we can consider this a bug. Additionally, we had a similar problem a while which was (apparently) resolved in commit b0820d553b6bef4400c493474d38930fee461b45. However, such changes have not been released, yet. So, which Wget version are you using? Could you please confirm that the issue persists in the latest git snapshot? Thanks. - AJ On Sun, 2015-09-27 at 14:29 -0700, Barry Allard wrote: > # skips all double-encoded [ui]ris because it reinterprets them, outside > uri.c:reencode_escapes(), probably in iri.c. > wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html > > # works > wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html > > Correct [ui]ri: > http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html > (200) > Incorrect [ui]ri: Correct [ui]ri: > http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html > (404) > # pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/" > > Simple-but-incomplete hackaround: use --no-ri > > To improve compatibility with mirroring international sites, the iri code > path could approximate behavior of url.c/url_parse() by avoiding unnecessary > modification to --mirror extracted [ui]ris, possibly around the time it > adds/dequeues them to/from the queue. > > Best, > Barry Allard
