Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Duncan Coutts Wed, 26 Sep 2007 10:46:44 -0700

In message <[EMAIL PROTECTED]> Jonathan Cast <[EMAIL PROTECTED]> writes:
> On Wed, 2007-09-26 at 09:05 +0200, Johan Tibell wrote:


> > If UTF-16 is what's used by everyone else (how about Java? Python?) I
> > think that's a strong reason to use it. I don't know Unicode well
> > enough to say otherwise.
> 
> I disagree.  I realize I'm a dissenter in this regard, but my position
> is: excellent Unix support first, portability second, excellent support
> for Win32/MacOS a distant third.  That seems to be the opposite of every
> language's position.  Unix absolutely needs UTF-8 for backward
> compatibility.

I think you're talking about different things, internal vs external 
representations.

Certainly we must support UTF-8 as an external representation. The choice of
internal representation is independent of that. It could be [Char] or some
memory efficient packed format in a standard encoding like UTF-8,16,32. The
choice depends mostly on ease of implementation and performance. Some formats
are easier/faster to process but there are also conversion costs so in some use
cases there is a performance benefit to the internal representation being the
same as the external representation.

So, the obvious choices of internal representation are UTF-8 and UTF-16. UTF-8
has the advantage of being the same as a common external representation so
conversion is cheap (only need to validate rather than copy). UTF-8 is more
compact for western languages but less compact for eastern languages compared to
UTF-16. UTF-8 is a more complex encoding in the common cases than UTF-16. In the
common case UTF-16 is effectively fixed width. According to the ICU implementors
this has speed advantages (probably due to branch prediction and smaller code 
size).

One solution is to do both and benchmark them.

Duncan
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Reply via email to