I guess I used "locale" when I really meant whatever something like
ISO8859-1 is (character code? - I think IBM calls it the CCSID). It might
be neat to be able to read a "string" from one file, which uses ISO8859-1
and compare it with a "string" read from another file, which uses IBM-1047,
and be able to do a compare operation which have the "proper" result. Which
indeed may be a locale since the French and the Swiss use versions of
French which sort the same glyph in different orders.

I guess what I would like is to defer the "culturally correct" results to
the underpinnings of the computer language than to require each and every
programmer to write the correct code in each and every program that they
write. So that a person could write a "x" program in France, and an English
speaker would get useful results.

OK, OK, we'll all do our programming in the only universal language:
Klingonaase. <grin/>



On Mon, Oct 28, 2013 at 11:23 AM, Tony Harminc <[email protected]> wrote:

> On 28 October 2013 11:56, John McKown <[email protected]>
> wrote:
> > Well, I still like the _concept_ of an "internal character set" instead
> of
> > using ISO8859-1, or CP-037, or ???? .
>
> No argument. The problem is that the Java developers used their
> understanding of the then nascent UNICODE standard, and hard-coded it
> in both the language and the JVM. To be fair, this was in the early
> 1990s, and they were early adopters of UNICODE at a time when there
> was still considerable resistance.
>
> > Personally, if it were me, I'd be
> > looking at UTF-8 for internal coding. And somehow address the
> > lexicographical sorting / comparison (if that's the proper phrase - I'll
> > defer to others if I'm wrong)
>
> It's often called "culturally correct", but there's nothing wrong with
> lexicographical in this context.
>
> > using some sort of locale information. Each "I/O definition"
> >  would define the locale of the external representation.
> > This would determine how to transform it to/from the internal UTF-8
> > representation.
>
> Be careful not to muddle "locale" with character encoding. They are
> largely independent. The UNIXy concept of locale carries baggage like
> sort order(s), date and time of day representation, currency format,
> etc.
>
> > It might even be nice if the language had a "string" data
> > type which has the encoded locale of the data for that instance of the
> > string value. Said locale [cw]ould be inherited when the string was read
> > from an external source, or assigned from another string.
>
> It's far from clear that locale should follow strings around.
> Certainly character encoding needs to, but in a UNICODE-only world,
> there should be just the one. But the other aspects of the locale
> notion generally follow the end-user around, not the data or the
> instance of the desktop/mobile browser or OS. There are plenty of use
> cases for end users in different countries -- or simply with different
> cultural or personal preferences -- to view the same data with
> different results on sort and search. But requirements clash. For
> example, MS Project offers choices as to (among many others) which day
> the week begins on. My week starts on Monday, but (I imagine for
> reasons only of tradition) the Project default is to start on Sunday.
> This attribute attaches to the document, where I think it should be
> part of the user preferences. So I find myself confused viewing a
> document created by someone else that has the weekend split between
> Saturday and Sunday, but if I had my own preferred view it would be
> confusing discussing the schedule over the phone with someone with
> another view.
>
> I digress, again.
>
> Tony H.
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
>



-- 
This is clearly another case of too many mad scientists, and not enough
hunchbacks.

Maranatha! <><
John McKown

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to