I'll look over the proposal more carefully when I get time, but the most important issue is to not let the storage type leak into the interface.

From an implementation point of view, UTF-16 is the most efficient representation for processing Unicode. It's the native Unicode representation for Windows, Mac OS X, and the ICU open source i18n library. UTF-8 is not very efficient for anything except English. Its most valuable property is compatibility with software that thinks of character strings as byte arrays, and in fact that's why it was invented.

UTF-32 is conceptually cleaner, but characters outside the BMP (Basic Multilingual Plane) are rare in actual text, so UTF-16 turns out to be the best combination of space and time efficiency.

Deborah

On Sep 24, 2007, at 3:52 PM, Johan Tibell wrote:

Dear haskell-cafe,

I would like to propose a new, ByteString like, Unicode string library
which can be used where both efficiency (currently offered by
ByteString) and i18n support (currently offered by vanilla Strings)
are needed. I wrote a skeleton draft today but I'm a bit tired so I
didn't get all the details. Nevertheless I think it fleshed out enough
for some initial feedback. If I can get the important parts nailed
down before Hackathon I could hack on it there.

Apologies for not getting everything we discussed on #haskell down in
the first draft. It'll get in there eventually.

Bring out your Unicode kung-fu!

http://haskell.org/haskellwiki/UnicodeByteString

Cheers,

Johan Tibell
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to