I pushed a conversion fix to master. There is another bug in wget that comes out with wget -d --local-encoding=cp1255 'http://he.wikipedia.org/wiki/%F9._%F9%F4%F8%E4'
Wget double escapes/converts to UTF-8... Maybe you can address this when you are working on the code !? Tim On Tuesday 15 December 2015 10:33:10 Tim Ruehsen wrote: > On Monday 14 December 2015 18:33:38 Eli Zaretskii wrote: > > > Date: Sun, 13 Dec 2015 20:04:31 +0100 > > > From: "Andries E. Brouwer" <[email protected]> > > > Cc: "Andries E. Brouwer" <[email protected]>, [email protected] > > > > > > On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote: > > > > If no one is going to pick up the gauntlet, I will sit down and do it > > > > myself, although I'm terribly busy with Emacs 25.1 release. > > > > > > Good! > > > > While working on this, I bumped into 2 related issues: > > 1. The functions that call 'iconv' (in iri.c) don't make a point of > > > > flushing the last portion of the converted URL after 'iconv' > > returns successfully having converted the input string in its > > entirety. IME, you need then to call 'iconv' one last time with > > either the 2nd or the 3rd argument set to NULL, otherwise > > sometimes the last converted character doesn't get output. In my > > case, some URLs converted from CP1255 to UTF-8 lost their last > > character. It sounds like no one has actually used this > > conversion in iri.c, except for trivially converting UTF-8 to > > itself. Is that possible/reasonable? > > You are absolutely right. > > Attached is a small test C code that shows (and fixes) the problem. > > Regards, Tim
