I'll look over the proposal more carefully when I get time, but the
most important issue is to not let the storage type leak into the
interface.
From an implementation point of view, UTF-16 is the most efficient
representation for processing Unicode. It's the native Unicode
representation for Windows, Mac OS X, and the ICU open source i18n
library. UTF-8 is not very efficient for anything except English. Its
most valuable property is compatibility with software that thinks of
character strings as byte arrays, and in fact that's why it was
invented.
UTF-32 is conceptually cleaner, but characters outside the BMP (Basic
Multilingual Plane) are rare in actual text, so UTF-16 turns out to
be the best combination of space and time efficiency.
Deborah
On Sep 24, 2007, at 3:52 PM, Johan Tibell wrote:
Dear haskell-cafe,
I would like to propose a new, ByteString like, Unicode string library
which can be used where both efficiency (currently offered by
ByteString) and i18n support (currently offered by vanilla Strings)
are needed. I wrote a skeleton draft today but I'm a bit tired so I
didn't get all the details. Nevertheless I think it fleshed out enough
for some initial feedback. If I can get the important parts nailed
down before Hackathon I could hack on it there.
Apologies for not getting everything we discussed on #haskell down in
the first draft. It'll get in there eventually.
Bring out your Unicode kung-fu!
http://haskell.org/haskellwiki/UnicodeByteString
Cheers,
Johan Tibell
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe