On 28 October 2013 11:56, John McKown <[email protected]> wrote: > Well, I still like the _concept_ of an "internal character set" instead of > using ISO8859-1, or CP-037, or ???? .
No argument. The problem is that the Java developers used their understanding of the then nascent UNICODE standard, and hard-coded it in both the language and the JVM. To be fair, this was in the early 1990s, and they were early adopters of UNICODE at a time when there was still considerable resistance. > Personally, if it were me, I'd be > looking at UTF-8 for internal coding. And somehow address the > lexicographical sorting / comparison (if that's the proper phrase - I'll > defer to others if I'm wrong) It's often called "culturally correct", but there's nothing wrong with lexicographical in this context. > using some sort of locale information. Each "I/O definition" > would define the locale of the external representation. > This would determine how to transform it to/from the internal UTF-8 > representation. Be careful not to muddle "locale" with character encoding. They are largely independent. The UNIXy concept of locale carries baggage like sort order(s), date and time of day representation, currency format, etc. > It might even be nice if the language had a "string" data > type which has the encoded locale of the data for that instance of the > string value. Said locale [cw]ould be inherited when the string was read > from an external source, or assigned from another string. It's far from clear that locale should follow strings around. Certainly character encoding needs to, but in a UNICODE-only world, there should be just the one. But the other aspects of the locale notion generally follow the end-user around, not the data or the instance of the desktop/mobile browser or OS. There are plenty of use cases for end users in different countries -- or simply with different cultural or personal preferences -- to view the same data with different results on sort and search. But requirements clash. For example, MS Project offers choices as to (among many others) which day the week begins on. My week starts on Monday, but (I imagine for reasons only of tradition) the Project default is to start on Sunday. This attribute attaches to the document, where I think it should be part of the user preferences. So I find myself confused viewing a document created by someone else that has the weekend split between Saturday and Sunday, but if I had my own preferred view it would be confusing discussing the schedule over the phone with someone with another view. I digress, again. Tony H. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
