On Friday, April 17, 2015 at 12:41:06 PM UTC-4, Steven G. Johnson wrote: > > > > On Friday, April 17, 2015 at 11:50:44 AM UTC-4, Scott Jones wrote: >> >> Ugh... for some of what I'm doing, it is nice to know that a string >> contains only ASCII characters, I really hope that you don't go ahead with >> removing ASCIIString. >> > > You can always call isascii(...) on a UTF-8 string ... can you explain > why you care in your application? >
I assume that is then an O(n) function, is it not? I'd rather have something that is O(1)! > > (I can treat it as ANSI Latin 1, without any modification, >> > > You can treat it as UTF-8, without any modification... > I don't want anything that requires O(n) operations for a lot of the string handling I need to do... > > >> or expand it to UTF-16 or UTF-32 by just widening bytes to 16-bit or >> 32-bit words, which I can do *very* >> fast in x86-64 assembly, esp. with some of the newer instructions!) >> > > Why would you want to do this? Except for interop with foreign libraries? > UTF-16 is the worst of all worlds as an encoding. > > UTF-16 that I know has no surrogate pairs... really just UCS2... Depends on what you are doing with UTF-16... and yes, interop with Java, JDBC, lots of databases... UTF-16 is very useful... UTF-8 can blow things up to 3 bytes per character, potentially taking 1.5 times the space of UTF-16... not good! > I do also think it would be nice to have 8-bit (ANSI Latin 1 or binary, >> not UTF-8), 16-bit (UCS2) >> > > Again, for interop with legacy files? I can see no other reason to use > Latin-1 (or Windows 1252) or UCS2 these days. > Latin-1 is a strict subset of Unicode (I'd never use anything other than Unicode - but both ASCII and ANSI Latin-1 are just subsets, and as such are rather useful to save space when you most of the time you are just dealing with text from Western Europe, the Americas, Australia & New Zealand... It's all about performance, both when doing string handling, and when saving/reading something to a database (or sending it over a wire)... Scott
