> I don't think that's the case. I think that quite simply provision has been > made for proper url-encoding but that 1) updating user-agents takes time, > and 2) POST data shouldn't use url-encoding but instead use > multipart/form-data because it make a hell of a lot more sense nowadays.
Uh, and exactly how do you think that URL Encoding could possibly specify a character set? Remember, it has to be useable in a GET request! IE, the URL itself would have to specify the charset, GET doesn't have a "Content-type:" per-se. Granted, in a POST it could, but then it wouldn't be URL encoding!!! The whole "form" thing was a poorly thought out hack, and the idea of URL encoding the form was a horrible idea in the first place. I suspect it was just a convenience thing, whoever was writing the first browser that had a form used whatever encoding code he already had, which was the routine for encoding a URL, thus the stupidity of the protocol that had to mirror the bad implementation... A story repeated far too often! > > > Realistically even UTF-8 is a hack. ALL the > > software and standards need to be updated, badly. Ideally all software > > should be able to deal with any incoming encoding, and really everything > > should be UTF-16 internally. At least then you have a fighting chance of > > representing an encoding in a consistent internal form. I'd give it about > > 40 years... > > No, UTF-8 isn't a hack, it's a well thought out encoding that makes it > possibly for a lot of text data to be forward compatible even when it > wasn't created to be so. [snip] Yeah, UTF-8 is a hack. Consider this, if the world was starting with a clean slate right now to design software and encodings from scratch, there would be one encoding that would work for ALL text, UTF-16 (ok, unless you're klingon or using really obscure chinese). All code would deal with it, etc. UTF-8 exists because its a compatibility hack, period. People may have thought up other "reasons" for it, but endianness matters not a bit, and size of data is dealt with by compression (which if you think about it is all UTF-8 really is, compressed UTF-16). In any case if you cared about the raw size of your data, you wouldn't touch XML with a 1000' pole to begin with! ;o). --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
