Chris Kuklewicz wrote:

Can I be among the first to ask that any Unicode variant of ByteString use a
recognized encoding?

<snip>

In reading all the poke/peek function I did not see anything that your tag bits
accomplish that the tag bits in utf-8 do not, except that you want to write only
a single routine for the poke/peek forwards and backwards operations instead of
two routines.  It is definitely more compact in the worst case, and more "Once
And Only Once", but at a very high cost of incompatibility.

The reason for inventing my own encoding is that it is easier to use and takes less space than UTF-8. The only advantage UTF-8 has is that it can be read and written directly. I guess this is a trade off, faster manipulation and smaller storage compared to simpler and faster io. I have not benchmarked it either way, so it is just guesswork for now.

Fortunately the entire library can be easily converted to use a different encoding by just changing the peekChar/pokeChar functions.

One of the biggest wins with with a Unicode ByteString will be the ability to
transfer the buffer directly to and from the disk and network.  Your code will
always need the data to be rewritten both incoming and outgoing.

The most ideal case would be the ability to load different encodings via import
statements while using the same API.

I was hoping that there would be only a single string type, with different encodings handled by functions:
 > encode :: CompactString -> ByteString
 > decode :: ByteString -> CompactString

This is important if it is not know beforehand how a file is encoded. For example on windows Unicode files are either UTF-8 or UTF-16, identified by a byte order mark.

Twan
_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to