On Thu 2008-05-29 18:45, Chad Scherrer wrote:
> Jed Brown <jed <at> 59A2.org> writes:
> > Uh, ByteString is Unicode-agnostic.  ByteString.Char8 is not.  So why not 
> > do IO
> > with lazy ByteString and parse into your own representation (which might 
> > look a
> > lot like StorableVector)?
> 
> One problem you might run into doing it this way is if a wide character is 
> split
> between two different arrays. In that case you have to do some post-porcessing
> to put the pieces back together. More efficient, I think, if you could force a
> given alignment when reading in the lazy bytestring. But there's not a way to 
> do
> that, is there?

Unless you are reading UTF-32, you won't know what alignment you want until you
get there.  If I remember correctly, the default block size is nicely aligned so
that in practice you shouldn't have to worry about a chunk ending with weird
alignment.  However, such alignment issues shouldn't affect you unless you are
using the internal interface.  If you want fast indexing, you have to parse one
character at a time anyway so you won't gain anything by unsafe casting (or
memcpy) into your data structure.

Jed

Attachment: pgplLsacGH5xc.pgp
Description: PGP signature

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to