On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
Deborah Goldsmith wrote:
UTF-16 is the native encoding used for Cocoa, Java, ICU, and
Carbon, and
is what appears in the APIs for all of them. UTF-16 is also what's
stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
APIs for backward compatibility. It's also used in plain text
files (or
XML or HTML), again for compatibility.
Deborah
On OS X, Cocoa and Carbon use Core Foundation, whose API does not
have a
one-true-encoding internally. Follow the rather long URL for details:
http://developer.apple.com/documentation/CoreFoundation/Conceptual/
CFStrings/index.html?http://developer.apple.com/documentation/
CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#//
apple_ref/doc/uid/20001179
I would vote for an API that not just hides the internal store, but
allows
different internal stores to be used in a mostly compatible way.
However, There is a UniChar typedef on OS X which is the same
unsigned 16 bit
integer as Java's JNI would use.
UTF-16 is the type used in all the APIs. Everything else is
considered an encoding conversion.
CoreFoundation uses UTF-16 internally except when the string fits
entirely in a single-byte legacy encoding like MacRoman or
MacCyrillic. If any kind of Unicode processing needs to be done to
the string, it is first coerced to UTF-16. If it weren't for
backwards compatibility issues, I think we'd use UTF-16 all the time
as the machinery for switching encodings adds complexity. I wouldn't
advise it for a new library.
Deborah
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe