On Tue, Dec 07, 2004 at 10:17:17AM -0500, Daniel Burrows wrote: > On Tuesday 07 December 2004 12:44 am, Peter Samuelson wrote: > > And if the app already deals with charset conversions but assumes > > iso-8859-1 input, then it's trivial to fix it to assume utf-8 input. > > This is not true. > > iso-8859-1 is an 8-bit charset, while Unicode is a 32-bit [0] charset. > Storing and manipulating iso-8859-1 strings requires no changes to internal > datatypes (only conversions for input and output); storing and manipulating > Unicode means you have to switch to a completely different set of > string-handling functions for all internal operations.
No, you do not have to do this. You can keep working with "char", the changes when switching to UTF-8 will mostly have to deal with the fact that one Unicode character is represented by more than one char. This means that you need to use a different strlen function, take care only to chop strings of char at character boundaries, ensure that input strings are actually valid UTF-8, etc. Cheers, Richard -- __ _ |_) /| Richard Atterer | GnuPG key: | \/¯| http://atterer.net | 0x888354F7 ¯ '` ¯