Re: [Bug-wget] bad filename

Bykov Aleksey Wed, 23 Apr 2014 06:58:11 -0700

Greetings, Darshit Shah
This was disscussed some (or long) time ago. 
Possible logic:
If locale isn't UTF-8 then process as before else
1. Convert string to WideCharString with mbstowcs(). 
2. For Each WideChar check it size with wctomb(). If size is 1 then compare it 
with mask. If char restricted, then "quoted++;"
3. If need, convert to lower/upper case with towlower()/towupper()
4. Recreate string char by char with wctomb: Convert char to temporary buffer. 
If filechar size is 1 compare with mask and repalce. Else "memcpy(q, 
char_buffer, char_size); q+=char_size;"
In windows i can't check it ( mbstowcs didn't work with UTF-8, so must be used 
MultiByteToWideChar()...)
Patch for windows (unstructured, unclear, unfinished, but worked) is attached.
Best Regards, Bykov Aleksey.


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
From: [email protected]
To: 
Date: 13:59:43, 04.23.2014
Subject: Re: [Bug-wget] bad filename
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~



>>On Tue, Apr 22, 2014 at 10:57 PM, Andries E. Brouwer
>> <[email protected]> wrote:
>> > If I ask wget to download the wikipedia page
>> >
>> > http://he.wikipedia.org/wiki/ש._שפרה
>> >
>> > then I hope for a resulting file ש._שפרה.
>> > Instead, wget gives me ש._שפר\327%94, where the \327
>> > is an unpronounceable byte that cannot be typed
>> > (This is an UTF-8 system and the filename
>> > that wget produces is not valid UTF-8.)
>> >
>> > Maybe it would be better if wget by default used the original filename.
>> > This name mangling is a vestige of old times, it seems to me.
>> >
>> > Andries
>> >
>> 
>> This is a commonly reported grievance and as you correctly mention a
>> vestige of old times. With UTF-8 supported filesystems, Wget should
>> simply write the correct characters.
>> 
>> I sincerely hope this issue is resolved as fast as possible, but I
>> know not how to. Those who understand i18n should work on this.
>> 
>> 
>> -- 
>> Thanking You,
>> Darshit Shah
>> 
>> 
>>

url_c.diff
Description: Binary data

Re: [Bug-wget] bad filename

Reply via email to