Doh! Thanks Jolyon for clearing that misunderstanding on my part. I was aware of the surrogate pair issue but I wrongly assumed that this might have been taken care by the iterator implementation. I guess not.
Thanks again! Cheers, Colin On 23 November 2010 13:06, Jolyon Smith <jsm...@deltics.co.nz> wrote: > Colin, the for C in loop and the for i := 1 to Length() loops are > functionally identical! The only difference is that the “for in” version > incurs the slight overhead of the enumerator framework invoked by the > compiler and runtime magic to support that syntax. > > > > But in neither case will the loop itself help detect/respond to surrogate > pairs (a single “WideChar” is potentially only ½ the data required to form a > complete “*character*”). The only way to reduce an iterator over a string > to a simple char-wise loop, whether explicit or using enumerators, is to > first convert to UTF32, the facilities for which in the Delphi RTL are > <cough> rudimentary, to put it politely. Non-existent may be nearer the > mark. > > > > The precise mechanics of the loop construct used is not material to that > problem. > > > > > > However, just as before Unicode when most people didn’t care and just wrote > code that assumed ANSI==ASCII, these days people won’t care and will write > code that assumes that Unicode==BMP (Basic Multilingual Plane), ignoring > surrogate pairs just as they used to ignore extended ASCII and ANSI > characters. > > > > And for most people, that will probably actually work. > > > > J > > > > > > *From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] > *On Behalf Of *Colin Johnsun > *Sent:* Tuesday, 23 November 2010 14:31 > *To:* NZ Borland Developers Group - Delphi List > > *Subject:* Re: [DUG] Upgrading to XE - Unicode strings questions > > > > I won't answer everything but just on this one question: > > On 23 November 2010 11:04, John Bird <johnkb...@paradise.net.nz> wrote: > > Extra question: > > It looks like code like > > for i:=1 to length(string1) do > begin > DoSomethingWithOneChar(string1[i]); > end; > > cannot be used reliably. The problems are that length(string1) looks like > it cannot be safely used - as unicode characters may include 2 codepoints > and length(string1) highlights that there is a difference between the > number > of unicode characters in a string and the number of codepoints. Still > figuring out what is the best practice here, as I have quite a lot of > string > routines. Should be be OK as long as the unicode text actually is ASCII. > > > > > > you can use something like this: > > > > var > > C: Char; > > ... > > for C in String1 do > > begin > > DoSomethingWithOneChar(C); > > end; > > > > In this case you don't need to know the index of each character, you just > get the char using the for..in..do loop. > > > > > > > > _______________________________________________ > NZ Borland Developers Group - Delphi mailing list > Post: delphi@delphi.org.nz > Admin: http://delphi.org.nz/mailman/listinfo/delphi > Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: > unsubscribe >
_______________________________________________ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe