> On Apr 12, 2019, at 12:58 PM, x <tam118...@hotmail.com> wrote: > > I’ve been asking myself if I could have done the above more efficiently as > sqlite’s converting the original string then I’m converting it and copying > it. While thinking about that I started to wonder how c++ handled utf8/16. > E.g. To access the i’th character does it have to rattle through all previous > I-1 characters to find the start of character i, how pointer arithmetic was > handled when pointing to utf8/16 chars etc. >
Basically, if you are dealing with a variable width encoding (UTF-8/UTF-16), then finding the nth character requires scanning the string counting beginning of characters. If this is an important operation, you pay the cost of conversion and work in UCS-4. On the other hand, UTF-8 has a lot of nice properties such that it can be a fairly seamless upgrade for processing plain ASCII text, and if reasonably efficient for typical text. (There are a number of complications if you try to support ALL of Unicode, like the composed characters, where you use several code-point together to define a single character), where you need to decide how you want to normalize and need some big character tables for the instructions of how to do this. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users