On Thu, 4 May 2017 09:56:18 +0100 Tony Whyman via Lazarus <[email protected]> wrote:
>[...] > I don't believe that string indexing even works for UTF8 strings at > present - at least not in a simple s[i] way. It exists the same as for UTF-16 strings. > Is it really that much overhead to have a simple codepage check before > calling the correct function to index a string? The obvious optimisation > would be to check for UTF8, then UTF16 then the Default codepage and > then the rest. Or perhaps UTF16 first for Windows. With register level > code you are talking about very few actual machine level operations. The char type does not fit widechar. You would need widechar. And in most cases the [] are used in loops. The compiler would have to add checks on each access. It would be faster to convert the string at the beginning to UnicodeString and back at the end. A technique that many RTL functions do to support any string type. > To me, a unified string type would have the advantage that: > > - You would only have one managed string type "string" (and hence avoids > the confusion that exists today). You can avoid the confusion by using only one string encoding, either UTF-8 or UTF-16. The problem is that existing libraries often support only one. >[...]> - The only time that a programmer has to think about the character > encoding is when writing code that interacts directly with an external > interface. That's already possible. With LazUTF8. The problem is legacy code and sharing code with Delphi. >[...] Mattias -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus-ide.org/listinfo/lazarus
