On Mon, Aug 17, 2015 at 01:17:06PM +0200, Tim Ruehsen wrote: > @Andries: Maybe you can put a few more test cases into that > (or send me a few examples that should work). > I also would like to see broken UTF-8 sequences in this test.
By some coincidence NoëlKöthe just sent a bug report that provides one more test case. Fetch http://zh.wikipedia.org/wiki/%E9%A6%96%E9%A1%B5. One hopes to get a file with file name 首页, that is, with bytes e9 a6 96 e9 a1 b5, and that is what the patched wget gives. The unpatched wget makes it (unpronounceable) with bytes e9 a6 25 39 36 e9 a1 b5 (because the byte 96 was escaped into "%96"). Andries [Here it is clear what one wants. In examples with broken UTF-8 sequences, something will happen as a result of the present code. It is unclear whether we want that or not. Changing the filename is bad, but illegal utf-8 is also bad. Today I prefer the unchanged filename, but see no need for a test that checks that we really get that.]
