> I don't think that's the case. I think that quite simply provision has been
> made for proper url-encoding but that 1) updating user-agents takes time,
> and 2) POST data shouldn't use url-encoding but instead use
> multipart/form-data because it make a hell of a lot more sense nowadays.

Uh, and exactly how do you think that URL Encoding could possibly specify a 
character set? Remember, it has to be useable in a GET request! IE, the URL 
itself would have to specify the charset, GET doesn't have a "Content-type:" 
per-se. Granted, in a POST it could, but then it wouldn't be URL encoding!!! 
The whole "form" thing was a poorly thought out hack, and the idea of URL 
encoding the form was a horrible idea in the first place. I suspect it was 
just a convenience thing, whoever was writing the first browser that had a 
form used whatever encoding code he already had, which was the routine for 
encoding a URL, thus the stupidity of the protocol that had to mirror the bad 
implementation... A story repeated far too often!



>
> > Realistically even UTF-8 is a hack. ALL the
> > software and standards need to be updated, badly. Ideally all software
> > should be able to deal with any incoming encoding, and really everything
> > should be UTF-16 internally. At least then you have a fighting chance of
> > representing an encoding in a consistent internal form. I'd give it about
> > 40 years...
>
> No, UTF-8 isn't a hack, it's a well thought out encoding that makes it
> possibly for a lot of text data to be forward compatible even when it
> wasn't created to be so. [snip]

Yeah, UTF-8 is a hack. Consider this, if the world was starting with a clean 
slate right now to design software and encodings from scratch, there would be 
one encoding that would work for ALL text, UTF-16 (ok, unless you're klingon 
or using really obscure chinese). All code would deal with it, etc. UTF-8 
exists because its a compatibility hack, period. People may have thought up 
other "reasons" for it, but endianness matters not a bit, and size of data is 
dealt with by compression (which if you think about it is all UTF-8 really 
is, compressed UTF-16). In any case if you cared about the raw size of your 
data, you wouldn't touch XML with a 1000' pole to begin with! ;o).

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to