On Friday, April 17, 2015 at 12:41:06 PM UTC-4, Steven G. Johnson wrote:
>
>
>
> On Friday, April 17, 2015 at 11:50:44 AM UTC-4, Scott Jones wrote:
>>
>> Ugh... for some of what I'm doing, it is nice to know that a string 
>> contains only ASCII characters,  I really hope that you don't go ahead with 
>> removing ASCIIString.
>>
>
> You can always call isascii(...) on a UTF-8 string  ... can you explain 
> why you care in your application?
>

I assume that is then an O(n) function, is it not?

I'd rather have something that is O(1)!
 

>  
>
(I can treat it as ANSI Latin 1, without any modification,
>>
>
> You can treat it as UTF-8, without any modification...
>

I don't want anything that requires O(n) operations for a lot of the string 
handling I need to do... 

>  
>
>> or expand it to UTF-16 or UTF-32 by just widening bytes to 16-bit or 
>> 32-bit words, which I can do *very*
>> fast in x86-64 assembly, esp. with some of the newer instructions!)
>>
>
> Why would you want to do this?  Except for interop with foreign libraries? 
>  UTF-16 is the worst of all worlds as an encoding.
>
>
UTF-16 that I know has no surrogate pairs... really just UCS2...
Depends on what you are doing with UTF-16...  and yes, interop with Java, 
JDBC, lots of databases... UTF-16 is very useful...
UTF-8 can blow things up to 3 bytes per character, potentially taking 1.5 
times the space of UTF-16... not good!
 

> I do also think it would be nice to have 8-bit (ANSI Latin 1 or binary, 
>> not UTF-8), 16-bit (UCS2)
>>
>
> Again, for interop with legacy files?  I can see no other reason to use 
> Latin-1 (or Windows 1252) or UCS2 these days.
>

Latin-1 is a strict subset of Unicode (I'd never use anything other than 
Unicode - but both ASCII and ANSI Latin-1 are just subsets,
and as such are rather useful to save space when you most of the time you 
are just dealing with text from Western Europe, the Americas, Australia & 
New Zealand...

It's all about performance, both when doing string handling, and when 
saving/reading something to a database (or sending it over a wire)...

Scott 

Reply via email to