[fpc-devel] Unicode proceedings

Michael Schnell Tue, 15 Nov 2011 03:34:42 -0800

Here, there have been lots of long winding and partly quite fruitlessdiscussions on the implementation of the new Unicode aware string type(s).

IMHO, before trying to decide regarding any implementation details,there should be a _very_explicit_ decision on the general functionality.

Here, I would suggest to first decide between the (IMHO) three sensiblemutually exclusive variants:

A)

Have the new string type(s) maintain the very strict typing paradigmPascal usually imposes. This IMHO asks for multiple explicit stringtypes such as ANSI_1252_String, UTF8_String, UTF16_String, etc plus theappropriate single-character types. This of course would allow forautomatic conversion without ambiguity. As the types are mutuallyexclusive, function calls always will do automatic conversions, unlessthe appropriate overloaded function is defined. These string types ofcourse will not include fields denoting their encoding and byte countper code element.

B)

Only do a single string type that is decently dynamically typed. Thesestrings of course will include fields denoting their encoding and bytecount per code element. Here conversions will happen when twodifferently typed strings are combined in some operation. An emptystring would be handled as having no predefined encoding, so thatcombining it with any other string will not force a conversion. As thereis only one string type, function calls will never trigger a conversionand the encoding of the function result is not predefined. Of course asingle character type is defined that can hold any encoding andsupposedly will be done in a way that it in fact is dynamically encodedas well. To enforce that a string is provided in some definite encoding,appropriate function (or compiler magic) is available.

C)

Handle these strings similar to Pascal-Objects that allow forinheritance (and provide the appropriate operator-overloading). So thereis a "Parent" string type (aka RAW) that has no predefined encoding andmultiple "Child" types that define different encoding enforcements. Asthe parent type of course needs to include fields denoting theirencoding and byte count per code element, the child types of course areimplemented in the same way (which in theory allows for "intersexualstrings that feature data encoded correctly but differently than thetype denotes). Here (for non-intersexual strings) conversion can behandled in an unambiguous ways (using either the type <if it is not the"Parent" type> or the dynamically given encoding) and a non RAW targettype of ":=" might request for yet another conversion. A functiondefinition can either use the Parent type (RAW) and so will not triggera conversion when being called or use one of the Child string types andtrigger an appropriate automatic conversion. Of course single charactertypes matching the Parent and all Child string types are necessary.

While neither A nor B is Delphi XE compatible in any way, C seems a bitsimilar to what Emb does. But AFAIK, Delphi does not provide anunambiguous, well defined and understandable paradigm (such as aObject-like Parent/Child relationship) for the features of the differentstring types. So the FPC team should be free do do a decent definition.


-Michael
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

[fpc-devel] Unicode proceedings

Reply via email to