No on two counts: String[1] is one WIDE character, which may or may not be a complete Unicode codePOINT (and so equally may or may not be a complete Unicode character, although the definition of what constitutes a "character" in Unicode is a whole separate topic).
Length( s ) will always yield the number of chars in s. The only wrinkle that Unicode introduces here is that the number of chars no longer == the number of *bytes* (each char is a WIDEChar and therefore 2 bytes). But you can still reliably index each WIDEChar in a WIDEString using the [nth] element index. Strings in COM have always been WideString - conversion to/from UnicodeString is automatic and lossless (in terms of data). TCP, yes you will have to do work to support Unicode in this area if you haven't already done so (but the internet has - if not entirely, then in large part - been Unicode for a long time now, so you really should have taken care of this already, regardless of the Unicode-ness or otherwise of your Delphi code itself). But that applies to ANY external systems with which your code interacts that may already be Unicode (or indeed which will remain resolutely ANSI, even if your app becomes Unicode). In addition to inconveniences for people who had already done some work to support Unicode, the implementation does little/nothing to encourage or promote *correct* Unicode support in new projects and introduces potential for confusion and mistakes in many areas imho. The entire string handling area of the RTL should have been thrown out and a properly thought out framework introduced to replace it, and yes, we should have been forced to migrate to the new, consistent and comprehensive string RTL (or at least encouraged, by marking all existing RTL support as "deprecated"). PLUS, for the backwards compatability crowd, they *could* have supported a "String == Unicode" compiler switch imho (not just an "I wish they had" - I can see technically precisely HOW it would and could have been implemented, and it fits perfectly with their own advice for how to deal with code that is problematic to convert to Unicode). Whilst at a technical level this may not have been a huge advantage, it certainly would have been a welcome comfort to people facing the job of converting large applications with libraries of - in some cases no longer supported - 3rd party library code, by enabling them to "flag" those units as "ANSI" and deal with the conversion warnings that would have subsequently been emitted by linking with the Unicode VCL. The only real argument against a compiler switch comes from the view that having two versions of the VCL - one Unicode and one ANSI - would have been required and would have been unworkable. This is not the case IMHO. The VCL could have gone unilaterally and fixedly String==UnicodeString whilst allowing us to compile our own units with String==ANSI/UnicodeString As I say, the technique of enforcing ANSI-ness in "unsafe Unicode" units in order to defer the job of migrating those units to Unicode is well documented and is the official advice in such difficult cases. A compiler switch as I envisage it would simply have made that process more straightforward - the net effect would have been the same, which on its own demonstrates that such a switch was in fact technically possibly IF IMPLEMENTED IN THAT WAY, despite the protestations to the contrary (which assume a DIFFERENT implementation approach). Too late now of course. :) -----Original Message----- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of John Bird Sent: Tuesday, 23 November 2010 15:36 To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions My main remaining question is the best way to handle code that up to now looked like: for i:=1 to length(string1) do begin DoSomethingWithOneChar(string1[i]); end; If I got the gist correctly, string1[i] is one unicode character, but length(string1) is the number of codepoints in the string and not the number of characters. This is gonna be confusing! Other comments: Comment 1 - I saw quite a few commentators say that they in general approved of the way that the unicode had been implemented - everything that was ansi string before is now unicode consistently throughout the whole language and IDE, and in the main the only code that needs altering is where Delphi is communicating outside the standard language: ie -DLL calls -SavetoFile and LoadFromFile and other file access - even here smart defaults have been put in to retain expected behaviour. -Sending strings to COM/TCP etc you might need to convert to get the kind expected -Database fields - usually handled by making sure the right encoding is sent. Comment 2 - The worst inconveniences are for those who have already tried to do some unicode type processing using WideChar, and the functions that were used for these. Undoing these changes is usually the best way to cater for unicode. Also some of the routines introduced then have horribly confusing names, like AnsiPos which is for searching widechars and is still what should be used for searching. It seems to me that some identical routines should be introduced - eg called UnicodePos(.....) just so that those who are new to Unicode can use at least a consistently named set of tools. I would probably make routines named like this which I use just to be clear. Comment 3 - I see a few people arguing that there should have been a compiler switch to allow compiling to ansistring or unicode string depending on the compiler switch, to ease converting people to D2009/XE. There are merits either way on this - in the long term if everyone is going to have to live in a unicode world then its probably better to bite the bullet and be made to convert code as eventually you cannot escape it. In such a case a simpler compiler and VCL is a big advantage. This is sort of related to being able to cross compile to 64 bit, iPhone, Android - whatever way makes it easy to have these forward looking options. The quite stark reality is that in 5 years it looks like much but not all commercial software will be running on Windows, its likely to be a mix of Web/iPhone/Android/GoogleOS/MacOS so the forwards portability of compiling Delphi for different environments is way more important than whether it should be able to do Strings as AnsiString. Comment 4 - Has anyone at Embarcadero considered 2 ways to make cross platform? option A is to go for a native compiler for different OS's - best if can be done. option B is the Java route - compile to intermediate code for a Delphi Virtual Machine which can run interpreted with a runtime on many OS's. Could be called the Delphi Virtual VCL Machine. The reason why this might be a good way to go is that Delphi was originally designed as a teaching language - ie formally very strongly typed and formally well structured language- it could be about the best candidate around for generalised compiling and a simple cross platform runtime. Also with Java now owned by Oracle there is questions over if it has such a bright future and there is room for another similar approach. DotNet is a similar idea too, but will only ever really be Windows. A Delphi Virtual Machine might not matter too much if its slower if its portable. [But I digress - The last point is way off topic for Unicode however] Comment and question 5 - What is the status of Free Pascal/Lazarus wrt to unicode? Does Delphi XE code port or not to Free Pascal? Its an issue to consider as well. _______________________________________________ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe _______________________________________________ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe