Graeme Geldenhuys schrieb:
On 2011-10-21 10:19, Hans-Peter Diettrich wrote:
Please specify "Finding", a code snippet would be nice.
Knock yourself out...
https://github.com/graemeg/fpGUI/blob/master/src/corelib/fpg_stringutils.pas
Take a look at UTF8Copy() or UTF8Insert() etc.
I didn't mean the implementation, but the *task* to perform in
application code.
in FPC, until now. Give an example of UTF-8 code, which would become
*more* complicated with UTF-16.
Consider a Copy() type function where you want to copy a Unicode
codepoint (think single character as you see on the screen - ignoring
combining diacritics for now) out from a string.
Again, *why* would you ever want to do that? It sounds to me like
extracting bits from floating point values :-(
UTF8Copy() as defined
above will do that correctly, irrespective if the codepoint is in the
BMP or Supplementary Plane or if the character is represented by 1,2,3
or 4 bytes in length.
Why restrict such a function to UTF-8? For working with *logical*
characters a set of functions is needed, that do not rely on character
indices. A StartIndex parameter IMO indicates bad design :-(
The functions can be easily overloaded to work with AnsiChar and
WideChar string arguments, or even UCS4Char, if you like.
With UTF-16 you need to check if the UTF-16 string is Little Indian or
Big Indian (UTF-16BE or UTF-16LE),
This has to be done only on input from an file, where the encoding
should be converted into the internal representation for every external
encoding.
BTW, its "Endian", not "Indian" nor "Chinese" ;-)
whether the codepoint has a surrogate
pair or not. All in all, a lot more complex than UTF-8.
Sorry, UTF-8 and UTF-16 only provide different encodings for the same
Unicode codepoints. Mixing Char and Codepoint indices and counts never
is a good idea. With that in mind it's no problem to perform the same
task on any encoding.
DoDi
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus