Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Deborah Goldsmith Tue, 25 Sep 2007 19:47:27 -0700

I'll look over the proposal more carefully when I get time, but themost important issue is to not let the storage type leak into theinterface.

From an implementation point of view, UTF-16 is the most efficientrepresentation for processing Unicode. It's the native Unicoderepresentation for Windows, Mac OS X, and the ICU open source i18nlibrary. UTF-8 is not very efficient for anything except English. Itsmost valuable property is compatibility with software that thinks ofcharacter strings as byte arrays, and in fact that's why it wasinvented.

UTF-32 is conceptually cleaner, but characters outside the BMP (BasicMultilingual Plane) are rare in actual text, so UTF-16 turns out tobe the best combination of space and time efficiency.


Deborah

On Sep 24, 2007, at 3:52 PM, Johan Tibell wrote:

Dear haskell-cafe,

I would like to propose a new, ByteString like, Unicode string library
which can be used where both efficiency (currently offered by
ByteString) and i18n support (currently offered by vanilla Strings)
are needed. I wrote a skeleton draft today but I'm a bit tired so I
didn't get all the details. Nevertheless I think it fleshed out enough
for some initial feedback. If I can get the important parts nailed
down before Hackathon I could hack on it there.

Apologies for not getting everything we discussed on #haskell down in
the first draft. It'll get in there eventually.

Bring out your Unicode kung-fu!

http://haskell.org/haskellwiki/UnicodeByteString

Cheers,

Johan Tibell
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Reply via email to