I read most of the discussion and I think there is no way around a string type containing an encoding field. First, it allows also to support non utf encodings or utf-32 encoding. Having the encoding field does not mean that all target support all encoding. In case an encoding is not supported, the target might either use some default operation as the current widestring manager does or it might spite out an exception. Having such a string type requires some manager which does not only store the procedures to handle this string type but which also contains some information which encoding to prefer or even use solely. Combining this with several ifdefs and compiler switches makes this approach very flexible and fast and allows everybody (FPC people, Lazarus, MSE) to adapt things to their needs.
Just an example: to overcome the indexing problem efficiently when using an encoding field (this is not about surrogates), we could do the following: introduce a compiler switch {$unicodestringindex default,byte,word,dword}. In default mode the compiler gets a shifting value from the encoding field (this is 4 bytes anyways and could be split into 1 byte shifting, 2 bytes encoding, 1 bytes reserved). In the other modes the compiler uses the given size when indexing. For example, a Tuberion (or how is it called?) switch could set this to word. The approach has the big advantage, that you really need all procedures only once if desired. For example e.g. linux would get only utf-8 routines by default, utf-16 is converted to utf-8 at the entry of the helper procedures if needed. Usually, no conversion would be necessary because you see seldomly utf-16 in linux applications so only the check if the input strings are really utf-8 is necessary, this is very cheap because the data is anyways already in a cache line. Even more, this variable encoding approach allows also people using languages where utf-8 is more memory expensive than utf-16 (this is in numbers the majority of mankind) to use utf-8/utf-16 as needed to save memory only with a few modifications. I know this approach contains some hacks and requires some work but I think this is the only way to solve things for once and ever. _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal