Re: [Bug-wget] bad filenames (again)

Andries E. Brouwer Mon, 17 Aug 2015 07:03:40 -0700

On Mon, Aug 17, 2015 at 01:17:06PM +0200, Tim Ruehsen wrote:

> @Andries: Maybe you can put a few more test cases into that
> (or send me a few examples that should work).
> I also would like to see broken UTF-8 sequences in this test.


By some coincidence NoëlKöthe just sent a bug report
that provides one more test case.

Fetch http://zh.wikipedia.org/wiki/%E9%A6%96%E9%A1%B5.

One hopes to get a file with file name 首页, that is,
with bytes e9 a6 96 e9 a1 b5, and that is what the patched wget gives.
The unpatched wget makes it (unpronounceable) with
bytes e9 a6 25 39 36 e9 a1 b5 (because the byte 96 was escaped into "%96").

Andries



[Here it is clear what one wants. In examples with broken UTF-8
sequences, something will happen as a result of the present code.
It is unclear whether we want that or not. Changing the filename
is bad, but illegal utf-8 is also bad. Today I prefer the unchanged
filename, but see no need for a test that checks that we really get that.]

Re: [Bug-wget] bad filenames (again)

Reply via email to