On Tue, 2007-10-02 at 08:02 -0700, Deborah Goldsmith wrote: > On Oct 2, 2007, at 5:11 AM, ChrisK wrote: > > Deborah Goldsmith wrote: > > > >> UTF-16 is the native encoding used for Cocoa, Java, ICU, and > >> Carbon, and > >> is what appears in the APIs for all of them. UTF-16 is also what's > >> stored in the volume catalog on Mac disks. UTF-8 is only used in BSD > >> APIs for backward compatibility. It's also used in plain text > >> files (or > >> XML or HTML), again for compatibility. > >> > >> Deborah > > > > > > On OS X, Cocoa and Carbon use Core Foundation, whose API does not > > have a > > one-true-encoding internally. Follow the rather long URL for details: > > > > http://developer.apple.com/documentation/CoreFoundation/Conceptual/ > > CFStrings/index.html?http://developer.apple.com/documentation/ > > CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#// > > apple_ref/doc/uid/20001179 > > > > I would vote for an API that not just hides the internal store, but > > allows > > different internal stores to be used in a mostly compatible way. > > > > However, There is a UniChar typedef on OS X which is the same > > unsigned 16 bit > > integer as Java's JNI would use. > > UTF-16 is the type used in all the APIs. Everything else is > considered an encoding conversion. > > CoreFoundation uses UTF-16 internally except when the string fits > entirely in a single-byte legacy encoding like MacRoman or > MacCyrillic. If any kind of Unicode processing needs to be done to > the string, it is first coerced to UTF-16. If it weren't for > backwards compatibility issues, I think we'd use UTF-16 all the time > as the machinery for switching encodings adds complexity. I wouldn't > advise it for a new library.
I would like to, again, strongly argue against sacrificing compatibility with Linux/BSD/etc. for the sake of compatibility with OS X or Windows. FFI bindings have to convert data formats in any case; Haskell shouldn't gratuitously break Linux support (or make life harder on Linux) just to support proprietary operating systems better. Now, if /independent of the details of MacOS X/, UTF-16 is better (objectively), it can be converted to anything by the FFI. But doing it the way Java or MacOS X or Win32 or anyone else does it, at the expense of Linux, I am strongly opposed to. jcc _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe