On Oct 20, 2019, at 9:20 PM, Darren Duncan <dar...@darrenduncan.net> wrote:
> 
> Rowan, you're talking about Unicode codepoints; however, Unicode graphemes, 
> what typical humans consider to be characters, are sequences of 1..N 
> codepoints, example a letter plus an accent that get composed together, and 
> this is what takes those large tables; this is related to Unicode Normal 
> Forms, eg NFD vs NFC, and its not about codepoint encodings like UTF-8 vs 
> UTF-16 etc. -- Darren Duncan

+1.  strlen() and character indexing on UTF-8 text is nontrivial:

    https://stackoverflow.com/q/6162484/142454

Points 10, 27, 46, 47, and 48 under “Assume Brokenness” are relevant here.

If you’re not seeing that Perl’s handling of Unicode and thus that the accepted 
answer by one of Perl’s best practitioners matters here on the SQLite mailing 
list (which is to say, not Perl) the point is that Perl’s implementation of 
Unicode is one of the best available among the set of languages designed before 
Unicode became popular, so if you want a model to learn lessons from, it’s one 
of the best you could pick.

If you read this answer and don’t learn anything, you’re either too ignorant to 
understand what you’ve read and need to shore up the basics first, too arrogant 
to learn, or one of a very small number of people who truly understand Unicode. 
 I re-learn something every time I read this answer, because I’m not so deeply 
steeped in Unicode arcana to retain it all long-term.

And it’s a *summary*!  More weirdness awaits!
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to