> I think you are confusing Canonical & Normalized versions 
> of the same Unicode string (in the example s1 is canonical, 
> s2 is normalized) and the effect of local codepage conversion.

Yep, and for the record I think this is a big problem with the way Embarcadero 
implemented Unicode.

By pursuing the "Unicode is a no-brainer" approach (facilitating easy migration 
for ASCII apps) they have obfuscated the fact that Unicode is far from simple.  
Or at least doing it right is.

Danny Thorpe opined years ago that it made a lot of sense to do 64-bit and 
Unicode in one go as a big-bang breaking change, leaving the 32-bit, ANSI VCL 
product behind as a legacy platform.  Danny Thorpe always was a clever guy!  ;)


 
> The "ö" can be written as a compound #$006F + #$0308 in 
> canonical format ... and as #$00f6 in the "normalized" 
> format. For most normal applications it just doesn't really 
> matter either way because a user that is inputting text under 
> his local codepage will always do it the same way

A user could specifically choose to enter that character in either form - this 
is unlikely, yes.  Or, two users using the same codepage could choose to enter 
the character differently.

Or if your data is coming from two separate external sources.

The *only* way to be sure is to normalise before processing.


> You only ever get issues if you cross codepage boundaries 
> (like for example if you have users in different countries 
> storing data in a database - which is why international 
> databases often use UTF-8 to store data instead of their 
> native charactersets).

This makes no sense at all to me.

"ö" encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8.  Whether you encode 
using UTF-8, UTF-16 or UTF-32, a single accented character codepoint vs a 
character followed by a diacritic are still two distinct "character" sequences.


_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi@delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: 
unsubscribe

Reply via email to