On 28 October 2013 11:56, John McKown <[email protected]> wrote:
> Well, I still like the _concept_ of an "internal character set" instead of
> using ISO8859-1, or CP-037, or ???? .

No argument. The problem is that the Java developers used their
understanding of the then nascent UNICODE standard, and hard-coded it
in both the language and the JVM. To be fair, this was in the early
1990s, and they were early adopters of UNICODE at a time when there
was still considerable resistance.

> Personally, if it were me, I'd be
> looking at UTF-8 for internal coding. And somehow address the
> lexicographical sorting / comparison (if that's the proper phrase - I'll
> defer to others if I'm wrong)

It's often called "culturally correct", but there's nothing wrong with
lexicographical in this context.

> using some sort of locale information. Each "I/O definition"
>  would define the locale of the external representation.
> This would determine how to transform it to/from the internal UTF-8
> representation.

Be careful not to muddle "locale" with character encoding. They are
largely independent. The UNIXy concept of locale carries baggage like
sort order(s), date and time of day representation, currency format,
etc.

> It might even be nice if the language had a "string" data
> type which has the encoded locale of the data for that instance of the
> string value. Said locale [cw]ould be inherited when the string was read
> from an external source, or assigned from another string.

It's far from clear that locale should follow strings around.
Certainly character encoding needs to, but in a UNICODE-only world,
there should be just the one. But the other aspects of the locale
notion generally follow the end-user around, not the data or the
instance of the desktop/mobile browser or OS. There are plenty of use
cases for end users in different countries -- or simply with different
cultural or personal preferences -- to view the same data with
different results on sort and search. But requirements clash. For
example, MS Project offers choices as to (among many others) which day
the week begins on. My week starts on Monday, but (I imagine for
reasons only of tradition) the Project default is to start on Sunday.
This attribute attaches to the document, where I think it should be
part of the user preferences. So I find myself confused viewing a
document created by someone else that has the weekend split between
Saturday and Sunday, but if I had my own preferred view it would be
confusing discussing the schedule over the phone with someone with
another view.

I digress, again.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to