Michael Schnell schrieb:

That's why such loops should be disallowed with Unicode strings, as kind of low level string handling.
Not only this, but the normal user would like to do

  MyChar := MyString[Length(MyString];

to get the last character of a string.

What's simply nonsense with UTF-8/16 strings :-(


Thus IMHO the Length function name should be dumped and two new functions (such as CharacterCount and ByteCount) should be introduced.

Length() since ever returned the number of *physical* elements.


For-Each loops may be acceptable as high level string handling, but with what type of the loop variable???

We obviously  would need a UnicodeChar Type that holds the 32 Bit encoding.

Iff we ever want to support such functionality.


But the said "quirks" can't be handled by this. I up till now don't understand if - technically - these "quirks" are seen as a single Unicode character or as a sequence of Unicode Characters. Nor do I understand how they can be used in a decent way and if they are necessary or just legacy.

The big mess starts with combinations of codepoints. No problems as long as the RTL functions deal with the physical storage of the codepoints, and nothing else.

DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to