Greetings, Darshit Shah This was disscussed some (or long) time ago. Possible logic: If locale isn't UTF-8 then process as before else 1. Convert string to WideCharString with mbstowcs(). 2. For Each WideChar check it size with wctomb(). If size is 1 then compare it with mask. If char restricted, then "quoted++;" 3. If need, convert to lower/upper case with towlower()/towupper() 4. Recreate string char by char with wctomb: Convert char to temporary buffer. If filechar size is 1 compare with mask and repalce. Else "memcpy(q, char_buffer, char_size); q+=char_size;" In windows i can't check it ( mbstowcs didn't work with UTF-8, so must be used MultiByteToWideChar()...) Patch for windows (unstructured, unclear, unfinished, but worked) is attached. Best Regards, Bykov Aleksey.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ From: [email protected] To: Date: 13:59:43, 04.23.2014 Subject: Re: [Bug-wget] bad filename ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ >>On Tue, Apr 22, 2014 at 10:57 PM, Andries E. Brouwer >> <[email protected]> wrote: >> > If I ask wget to download the wikipedia page >> > >> > http://he.wikipedia.org/wiki/ש._שפרה >> > >> > then I hope for a resulting file ש._שפרה. >> > Instead, wget gives me ש._שפר\327%94, where the \327 >> > is an unpronounceable byte that cannot be typed >> > (This is an UTF-8 system and the filename >> > that wget produces is not valid UTF-8.) >> > >> > Maybe it would be better if wget by default used the original filename. >> > This name mangling is a vestige of old times, it seems to me. >> > >> > Andries >> > >> >> This is a commonly reported grievance and as you correctly mention a >> vestige of old times. With UTF-8 supported filesystems, Wget should >> simply write the correct characters. >> >> I sincerely hope this issue is resolved as fast as possible, but I >> know not how to. Those who understand i18n should work on this. >> >> >> -- >> Thanking You, >> Darshit Shah >> >> >>
url_c.diff
Description: Binary data
