It's a shame UTF-8 wasn't made the standard in Delphi. It's commonly used in audio file tags, for example, which I have to deal with.
My software needs to search for songs with specific artists or titles, and it sounds like I'm going to have problems where the information is visually the same but entered differently in different parts of the world, using all sorts of 3rd party software. Ross. -----Original Message----- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 11:27 AM To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. > A user could specifically choose to enter that character in either form - > this is unlikely, yes. Or, two users using the same codepage could choose to > enter the character differently. > > Or if your data is coming from two separate external sources. > > The *only* way to be sure is to normalise before processing. > Agreed. That will eliminate any issues with composite codepoints. >> You only ever get issues if you cross codepage boundaries >> (like for example if you have users in different countries >> storing data in a database - which is why international >> databases often use UTF-8 to store data instead of their >> native charactersets). >> > This makes no sense at all to me. > > "รถ" encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8. Whether you > encode using UTF-8, UTF-16 or UTF-32, a single accented character codepoint > vs a character followed by a diacritic are still two distinct "character" > sequences. > True. I think the point is that UTF-8 is the most compact format without data loss, regardless of whether the codepoints are composite or not. Todd. _______________________________________________ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe _______________________________________________ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe