Graeme Geldenhuys schrieb:
On 2011-10-21 10:19, Hans-Peter Diettrich wrote:
Please specify "Finding", a code snippet would be nice.

Knock yourself out...


https://github.com/graemeg/fpGUI/blob/master/src/corelib/fpg_stringutils.pas


Take a look at UTF8Copy() or UTF8Insert() etc.

I didn't mean the implementation, but the *task* to perform in application code.


in FPC, until now. Give an example of UTF-8 code, which would become *more* complicated with UTF-16.

Consider a Copy() type function where you want to copy a Unicode
codepoint (think single character as you see on the screen - ignoring
combining diacritics for now) out from a string.

Again, *why* would you ever want to do that? It sounds to me like extracting bits from floating point values :-(

UTF8Copy() as defined
above will do that correctly, irrespective if the codepoint is in the
BMP or Supplementary Plane or if the character is represented by 1,2,3
or 4 bytes in length.

Why restrict such a function to UTF-8? For working with *logical* characters a set of functions is needed, that do not rely on character indices. A StartIndex parameter IMO indicates bad design :-( The functions can be easily overloaded to work with AnsiChar and WideChar string arguments, or even UCS4Char, if you like.

With UTF-16 you need to check if the UTF-16 string is Little Indian or
Big Indian (UTF-16BE or UTF-16LE),

This has to be done only on input from an file, where the encoding should be converted into the internal representation for every external encoding.

BTW, its "Endian", not "Indian" nor "Chinese" ;-)


whether the codepoint has a surrogate
pair or not. All in all, a lot more complex than UTF-8.

Sorry, UTF-8 and UTF-16 only provide different encodings for the same Unicode codepoints. Mixing Char and Codepoint indices and counts never is a good idea. With that in mind it's no problem to perform the same task on any encoding.

DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to