On Mon, Oct 17, 2005 at 06:12:25PM +0100, Matthew Garrett wrote:

> MTAs shouldn't be interpreting any utf-8. MUAs should only care about 
> getting the content-type header correct, and optionally converting to 
> the "simplest" character set.

MUAs also need to transmogrify any funny heathen characters in the
subject line.  MDAs need fiddling so that they can examine those
subjects usefully.  There are approxiamtely thirty trillion other
programs which need updating anyway - anything which deals with
information at any level higher than the byte.

cat would probably be OK.  tr, sed, awk, cut, head, tail all need fixin
because of their assumptions that character == byte, but then they need
fixing regardless of whether we use hateful utf-8 or less hateful large
equal-length characters.

>                               Terminals and string manipulation 
> libraries need rewriting, yes. The number of them that exist is quite 
> small compared to the number of applications that store strings.
> 
> UCS-4 breaks little assumptions like "A byte with a null in is the end 
> of a string". That's fairly hateful.

And replaces it with "0x00000000 is the end of a string".  This is still
== 0.  Of course, having a magic number of any kind mark the end of a
string is hateful, and is only needed because certain languages aren't
clever enough to store length and text.

-- 
David Cantrell | London Perl Mongers Deputy Chief Heretic

PLEASE NOTE: This message was meant to offend everyone equally,
regardless of race, creed, sexual orientation, politics, choice
of beer, operating system, mode of transport, or their editor.

Reply via email to