> What we need to do is support the full 32-bit Unicode > character set but we shouldn't use UTF-32 to do it > since we'll waste vast amounts of memory space since > characters above 16-bit are very very rare. We need > to instead switch to UTF-8 internally for everything. > This is the right answer for several reasons which > have all been covered in depth on several mailing > lists Since the characters have a variable bit-widthutf, utf-8 processing is very cpu intensive for everything but the basic 7-bit ascii charset. It is not meant to be used interanlly by applications, it is meant as an encoding for communication between applications over 8-bit chanells. Internally we need to use a fixed-width encoding, so if we want to support 32-bit Unicode, we have to redefine UT_UCSChar to long.
I agree that having 32 UT_UCSChar would vaste lot of memory, and I would like to see a case made first why we need to support 32-bit Unicode. Tomas
