Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

Twan van Laarhoven Mon, 05 Feb 2007 04:14:27 -0800

Chris Kuklewicz wrote:


Can I be among the first to ask that any Unicode variant of ByteString use a
recognized encoding?

<snip>

In reading all the poke/peek function I did not see anything that your tag bits
accomplish that the tag bits in utf-8 do not, except that you want to write only
a single routine for the poke/peek forwards and backwards operations instead of
two routines.  It is definitely more compact in the worst case, and more "Once
And Only Once", but at a very high cost of incompatibility.

The reason for inventing my own encoding is that it is easier to use andtakes less space than UTF-8. The only advantage UTF-8 has is that it canbe read and written directly. I guess this is a trade off, fastermanipulation and smaller storage compared to simpler and faster io. Ihave not benchmarked it either way, so it is just guesswork for now.

Fortunately the entire library can be easily converted to use adifferent encoding by just changing the peekChar/pokeChar functions.

One of the biggest wins with with a Unicode ByteString will be the ability to
transfer the buffer directly to and from the disk and network.  Your code will
always need the data to be rewritten both incoming and outgoing.

The most ideal case would be the ability to load different encodings via import
statements while using the same API.

I was hoping that there would be only a single string type, withdifferent encodings handled by functions:

 > encode :: CompactString -> ByteString
 > decode :: ByteString -> CompactString

This is important if it is not know beforehand how a file is encoded.For example on windows Unicode files are either UTF-8 or UTF-16,identified by a byte order mark.


Twan
_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

Reply via email to