On Tue, 16 Nov 2010, Marco van de Voort wrote:
Furthermore I think that in detail Unicode string handling should not be
based on single characters at all, but instead should use (sub)strings
all over, covering multibyte character representations, ligatures etc.
as well
This is dog slow. You can make such library for special purposes, but for
most day to day use this is overkill.
The most common stringoperations that the avg programmer does is searching for
substrings and then split on them, something that can be perfectly done in
UTF-8.
Then the basic operations would be insertion and deletion of
substrings, in addition to substring extraction and concatenation.
Basic operations with capital B yes, string support like in a Basic
interpreter.
Indeed. Javascript uses the 'substrings only' approach. Can't get more slow
than that :(
Probably the reason they injected regular expression treatment in the language;
To make some things at least acceptably fast.
But in general there is something to say for Hans-peter's statements.
Fixed-length character encoding simply make more sense from a computing
point of view.
Michael.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel