On Sunday 25 February 2007 14:41, Leonard den Ottolander wrote: > Hello Pavel, > > On Sat, 2007-02-24 at 14:57 +0200, Pavel Tsekov wrote: > > I'd like to initiate a discussion on how to make MC > > unicode deal with multibyte character sets. > >
The current utf-8 patches are based on utf-8 support in glibc. I don't know if utf-8 is needed on other systems. > > Just a few thoughts: > > - Because multibyte is rather more memory hungry I think the user should > still have the option to toggle the use of an 8bit path either in the > interface or at compile time. This means where the UTF-8 patches replace > paths we should preferably implement two paths. The situation with the utf-8 patches is following: In editor the utf-8 charset is converted to wchar. This requires 4 times more of memory, but allows to keep the code almost the same. In the rest of mc the utf-8 charset is used directly and the memory requirements are more or less the same as with 8bit charsets. > - I suppose a lot of the code of the UTF-8 patch can be reused, only we > will need to add iconv() calls in the appropriate places. libiconv is > already expected so not much trouble with the make files there. Iconv > should only be used for the multibyte path, not the 8bit path. Using the > multibyte path would still enable users to translate from one 8bit > charset to another. > - Unsupported character substitution character should be an ini option > (and define some defaults for all/many character sets). (I'm not sure > question mark is supported in all character sets.) > - Users should be able to set character set per directory (mount). Of > course there should be a system wide default taken from the environment > (but also overridable). > - Copy/move dialogs should have a toggle to iconv the file name or do a > binary name copy. > - Maybe copy/move dialogs should also have a toggle to iconv file > content, which could be quite usable for text files. A warning dialog on > every copy/move (that the user explicitly has to disable) might be a > good addition then, to help uninformed users avoiding to screw up their > data. > The code in charsets.c is not compatible with utf-8 and needs to be completely rewritten. For example, the function convert_to_display(char *str) can't be used for converting to utf-8 where the string actually grows. With the current utf-8 patches charsets can't be used in utf-8 locales. -- Vladimir Nadvornik developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: [EMAIL PROTECTED] Lihovarská 1060/12 tel:+420 284 028 967 190 00 Praha 9 fax:+420 284 028 951 Czech Republic http://www.suse.cz _______________________________________________ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel