Duncan Coutts <[EMAIL PROTECTED]> writes: >>> Because I'm writing the Unicode-friendly ByteString =p
> He's designing a proper Unicode type along the lines of ByteString. So - storing 22.5 bit code points instead of 8-bit quantities? Or storing whatever representation from the input, and providing a nice interface on top? >> Perhaps I'm not understanding. Why wouldn't you use ByteString for I/O, Like everybody else, my first reaction is to put a layer (like Char8) on top of lazy bytestrings. For variable-length encodings, you lose direct indexing, but I think this is not very common, and if you need it, you should convert to a fixed length encoding instead. Since a BS is basically a (pointer to array,offset,length) triple, it should be relatively easy to ensure that you don't break a wide char between chunks by adjusting the length (which doesn't have to match the actual array length). > The reason we do not want to re-use ByteString as the underlying > representation is because they're not good for short strings and we > expect that for Unicode text (more than arbitrary blobs of binary data) > people will want efficient short strings. I guess this is where I don't follow: why would you need more short strings for Unicode text than for ASCII or 8-bit latin text? -k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
