In our previous episode, Jonas Maebe said: > > disaster. I don't want to create and maintain UTF8 versions of > > nearly every > > class, even when the class doesn't actually do anything UTF8 specific. > > If we support an UTF-8 version of the RTL, then either the code must > work both for UTF-16 and UTF-8, or it has to be separately maintained > anyway. And if the code works for both types, generics should enable > having both without source code duplication.
The -UTF8 hack is simply because of inheritance and the because NOW they still need to insert manual conversions. In the UTF8 and UTF16 rtl, the inheritance problems are gone by selecting the proper RTL and the conversions are automated due to cpnewstr. > Note that I'm not specifically arguing here for adding them or not, > but I don't think the maintenance will be much higher than what will > be imposed already by requiring that the RTL be compilable for both > UTF-8 and UTF-16. I'm not sure that is really the case, both because only really encoding sensitive places are affected in that case. Moreover, most of the remaining changes will be necessary anyway because of Delphi2009 functionality, also if we only do one RTL. Stuff that gets input like TStringList.Loadfromfile will receive encoding loading options anyway as per Delphi/unicode compat. (this becuase the encoding of the file that you load must be runtime selectable, and is not necessarily the same as your default encoding) So I don't expect the recompiling of the RTL with for UTF8 and UTF16 to require that much encoding specific changes. Basically it just parameterises the classes trees with a stringtype, and maybe changes the names of some RTL string routines which must be made for utf8 and utf16 anyway. (so more or less that in the UTF* rtl e.g. trimleft is utf8 and trimleftutf16 is utf16, while in the UTF16 rtl it is the otherway around, but one still has to create both anyway in any solution) The whole point is to avoid messing too much with overloading umpteen stringtypes and mimimize changing existing code (both FPC, Delphi and delphi/unicode) with suffixes (-UTF8 etc). Of course when one wants to blend UTF16 Delphi code in a UTF8 rtl, fixes might be needed, but at least if your code is _mostly_ UTF16, you have the option to go to the utf16 rtl. And then speicifically avoid modifications to virtual methods. Having multiple versions of virtual methods overloaded (same name or not) is very dangerous, since people might only override the wrong one ( just see the seek32/64 case) Moreover, nobody is wronged in the sense that "his" choice is second rate and must go through conversion layers, and the general principles are clear for everbody, without exhausting discussion at every single modification. At the expense of a few more release binaries to build, and dealing with bugreports in a different RTL then you would typically use. I admit that, but still think that it is a netto plus by a wide margin, and the _extra_ work that needs to be done on the code itself is consistently overestimated. If we remain delphi compat and only allow UTF16, we will over time get bugreports to make UTF8 variants for everything. Better require people to submit a working solution for both from the beginning. Every project (Lazarus, Delphi/Unicode compats, ansi legacy) picks a RTL fit for their purpose, and just uses STRING for the bulk of the code. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel