On Mon, Oct 17, 2005 at 06:12:25PM +0100, Matthew Garrett wrote: > MTAs shouldn't be interpreting any utf-8. MUAs should only care about > getting the content-type header correct, and optionally converting to > the "simplest" character set.
MUAs also need to transmogrify any funny heathen characters in the subject line. MDAs need fiddling so that they can examine those subjects usefully. There are approxiamtely thirty trillion other programs which need updating anyway - anything which deals with information at any level higher than the byte. cat would probably be OK. tr, sed, awk, cut, head, tail all need fixin because of their assumptions that character == byte, but then they need fixing regardless of whether we use hateful utf-8 or less hateful large equal-length characters. > Terminals and string manipulation > libraries need rewriting, yes. The number of them that exist is quite > small compared to the number of applications that store strings. > > UCS-4 breaks little assumptions like "A byte with a null in is the end > of a string". That's fairly hateful. And replaces it with "0x00000000 is the end of a string". This is still == 0. Of course, having a magic number of any kind mark the end of a string is hateful, and is only needed because certain languages aren't clever enough to store length and text. -- David Cantrell | London Perl Mongers Deputy Chief Heretic PLEASE NOTE: This message was meant to offend everyone equally, regardless of race, creed, sexual orientation, politics, choice of beer, operating system, mode of transport, or their editor.