Re: [fpc-devel] Delphi new AnsiStrings are incredibly broken :-(

Hans-Peter Diettrich Fri, 14 Oct 2011 04:25:21 -0700

Apart from the mentioned implementation flaws, I came across severeproblems with the new AnsiString *model* in general. Let's play aroundwith the Pos() function, which certainly is an inevitable part of anystringhandling.


A general function
 function Pos(SubStr: T1; Str: T2): integer;

should return the character index of SubStr in Str, i.e. Str[i] shoulddefinitely be the begin of SubStr within Str.

It also should be possible to find the end of SubStr within Str, inorder to e.g. return the remainder of the text.

With multiple coexisting string encodings we have to solve the followingproblems:

A reasonable result, i.e. the index in the given string, of the givenencoding T2, will require to convert the search string SubStr intoexactly that encoding. This requires two conversions, from T1 into UTF-8(or UTF-16) and then into T2. Clearly this can be prevented by usingstrings of only one encoding, but what about string literals? When astring literal has to be converted, it most probably ends up in UTF-8/16encoding, what would cause the Unicode version of Pos() being called,resulting in a wrong result. Even if we assume that string literals arestored as native (CP_ACP) strings, or as Unicode, what actually dependson compiler directives, a couple of overloaded Pos() functions had to beadded, when an unwanted conversion of *both* arguments into UTF-16 shallbe avoided.


The only possible solution were IMO a
 function Pos(SubStr: UnicodeString; Str: RawByteString): integer;

in the *hope* that this version takes precedence over the all-Unicodeversion.


But when we have the begin of the substring, how do we find its end?

Here Length(SubStr) is of little help, since it represents the number ofbytes in encoding T1, useless with T2. So we need a feature to determinethe length of an string in any (supported) encoding, like:

  function EncodedLength(s: string; cp: TEncoding): integer;

Or we add a function
 function EndPos(SubStr: T1; Str: T2): integer;
returning the index of the char following SubStr in Str.

Or we combine both, into

function Pos2(SubStr: T1; Str: T2; out begIndex, endIndex: integer):boolean;

with the result eventually indicating whether SubStr was found in Str.

But even if we implement all that, and use it *everywhere* in our code,the chance for any number of implicit encoding conversions remains :-(

Do you see any chance to reduce the number of possible conversions,other than by using only one single encoding throughout RTL andapplication code?

But what's the use of strings with a stored encoding, then? Except forstrict compatibilty with a flawed Delphi model and implementation, thatmay be dropped again in the next Delphi version?


DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Delphi new AnsiStrings are incredibly broken :-(

Reply via email to