On Sat, Jan 4, 2014 at 8:08 AM, Graeme Geldenhuys <mailingli...@geldenhuys.co.uk> wrote: > On 2014-01-04 04:34, Kostas Michalopoulos wrote: >> Is there a way to ignore all these and make everything to work with >> UTF-8? Like setting some global variable that makes all strings >> (ansistrings) "UTF-8 codepage" or something > > We will have to wait for FPC 2.8.0 (or 3.0) which should have much > better built-in Unicode support. String encoding conversion should then > be taken care of automatically. Unfortunately it seems that the FPC RTL > (there will be two of them) will be AnsiString or UTF-16 only. The RTL > encoding is not configurable! > > So under all Unix-like systems (Linux, MacOSX, FreeBSD - basically every > platform except Microsoft ones) there will be lots of string conversions > from/to the OS or any libraries (which are normally UTF-8) to the FPC > RTL which is going to be UTF-16. The constant conversion will also kick > in when you do streaming to/from file or any TCP/IP communications - > which both normally use UTF-8. > > I would have thought the Free Pascal team would improve their design > over Delphi. eg: Seeing that automatic encoding conversion is seamless, > I thought it shouldn't be hard to have native encodings on each > platform, and the RTL can then be a dynamic Unicode implementation (it > shouldn't care what encoding is used, as long as it is one of the > Unicode encodings). By that I mean UTF-8 is used under Unix like > systems, and UTF-16 under Windows. The UnicodeString type should have > lived up to its name, and not be an alias for UTF16String. But alas, > this is not going to happen. > > So we as developers have to use UTF-16 everywhere, or define our own > dynamic types (which really should have been done at RTL level). For > example: > > {$IFDEF Unix} > RealUnicodeString = UTF8String; > {$ENDIF} > {$IFDEF Windows} > RealUnicodeString = UTF16String; > {$ENDIF} > > Then use the RealUnicodeString type in your applications and frameworks > to minimise encoding conversions. But like I said, when you do this > under Unix like systems, you are still going to get conversions when > talking to the UTF-16 only RTL. Sad, but that is the way the Free Pascal > team is going. > > Once that FPC release is made, then we will start seeing what > performance impact it will have on all systems. Now is too early to tell.
+1 You always said this, ie, UnicodeString should be UTF-8 on Unix plataform and UTF-16 on Windows, and I always agreed with you. This make sense, this would be a true UnicodeString type. Delphi is the only "trouble" to do this happens? Should be so: === BEGIN === {$IFDEF Unix} UnicodeString = UTF8String; {$ENDIF} {$IFDEF Windows} UnicodeString = UTF16String; {$ENDIF} // the alias string = UnicodeString; // the automatic conversions function UnicodeToUTF8(const S: UnicodeString): UTF8String; begin {$IFDEF Unix} Result := S; {$ENDIF} {$IFDEF Windows} Result := UTF16ToUTF8(S); {$ENDIF} end; function UnicodeToUTF16(const S: UnicodeString): UTF16String; begin {$IFDEF Unix} Result := UTF8ToUTF16(s); {$ENDIF} {$IFDEF Windows} Result := S; {$ENDIF} end; === END === Maybe we are not seeing something, many details... Regards, Marcos Douglas -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus