On Thu 2008-05-29 18:45, Chad Scherrer wrote: > Jed Brown <jed <at> 59A2.org> writes: > > Uh, ByteString is Unicode-agnostic. ByteString.Char8 is not. So why not > > do IO > > with lazy ByteString and parse into your own representation (which might > > look a > > lot like StorableVector)? > > One problem you might run into doing it this way is if a wide character is > split > between two different arrays. In that case you have to do some post-porcessing > to put the pieces back together. More efficient, I think, if you could force a > given alignment when reading in the lazy bytestring. But there's not a way to > do > that, is there?
Unless you are reading UTF-32, you won't know what alignment you want until you get there. If I remember correctly, the default block size is nicely aligned so that in practice you shouldn't have to worry about a chunk ending with weird alignment. However, such alignment issues shouldn't affect you unless you are using the internal interface. If you want fast indexing, you have to parse one character at a time anyway so you won't gain anything by unsafe casting (or memcpy) into your data structure. Jed
pgplLsacGH5xc.pgp
Description: PGP signature
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
