Here, there have been lots of long winding and partly quite fruitless discussions on the implementation of the new Unicode aware string type(s).

IMHO, before trying to decide regarding any implementation details, there should be a _very_explicit_ decision on the general functionality.

Here, I would suggest to first decide between the (IMHO) three sensible mutually exclusive variants:


A)
Have the new string type(s) maintain the very strict typing paradigm Pascal usually imposes. This IMHO asks for multiple explicit string types such as ANSI_1252_String, UTF8_String, UTF16_String, etc plus the appropriate single-character types. This of course would allow for automatic conversion without ambiguity. As the types are mutually exclusive, function calls always will do automatic conversions, unless the appropriate overloaded function is defined. These string types of course will not include fields denoting their encoding and byte count per code element.

B)
Only do a single string type that is decently dynamically typed. These strings of course will include fields denoting their encoding and byte count per code element. Here conversions will happen when two differently typed strings are combined in some operation. An empty string would be handled as having no predefined encoding, so that combining it with any other string will not force a conversion. As there is only one string type, function calls will never trigger a conversion and the encoding of the function result is not predefined. Of course a single character type is defined that can hold any encoding and supposedly will be done in a way that it in fact is dynamically encoded as well. To enforce that a string is provided in some definite encoding, appropriate function (or compiler magic) is available.

C)
Handle these strings similar to Pascal-Objects that allow for inheritance (and provide the appropriate operator-overloading). So there is a "Parent" string type (aka RAW) that has no predefined encoding and multiple "Child" types that define different encoding enforcements. As the parent type of course needs to include fields denoting their encoding and byte count per code element, the child types of course are implemented in the same way (which in theory allows for "intersexual strings that feature data encoded correctly but differently than the type denotes). Here (for non-intersexual strings) conversion can be handled in an unambiguous ways (using either the type <if it is not the "Parent" type> or the dynamically given encoding) and a non RAW target type of ":=" might request for yet another conversion. A function definition can either use the Parent type (RAW) and so will not trigger a conversion when being called or use one of the Child string types and trigger an appropriate automatic conversion. Of course single character types matching the Parent and all Child string types are necessary.

While neither A nor B is Delphi XE compatible in any way, C seems a bit similar to what Emb does. But AFAIK, Delphi does not provide an unambiguous, well defined and understandable paradigm (such as a Object-like Parent/Child relationship) for the features of the different string types. So the FPC team should be free do do a decent definition.

-Michael
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to