Follow-up Comment #8, bug #60287 (project wget):
> Is this because wget first downloads the html file and then reads the
contents off disk
No. It's because Wget downloads the pages you told it to, and saves them as
disk files. Any links in the downloaded pages that lead to other pages
produce additional disk files (e.g., if you told Wget to download
recursively).
IOW, the file-name encoding issue happens when a Web page needs to be saved to
a file for some reason.
> If the bytes were downloaded with the correct encoding, and written to the
file system with the correct encoding, I would expect it to be able to parse
the file with the correct encoding.
What is the "correct encoding", though?
> the file `wget-test.html` has no non-ascii characters in it
Of course, it doesn't: the non-ASCII characters appear when we decode the
hex-encoded bytes.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60287>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/