In message <[EMAIL PROTECTED]> "Justin Bailey" <[EMAIL PROTECTED]> writes: > On 9/20/07, Duncan Coutts <[EMAIL PROTECTED]> wrote: > > A lazy bytestring is a list of strict bytestring which externally looks > > like one > > big string. Could you not just use a lazy bytestring and it's take and drop > > functions? Perhaps you can help me understand what it is you're trying to > > do? > > I'm working on the ICFP contest from this year, and the algorithm > frequently prepends long strings to the front of the "DNA" string > being processed. I originally worked only with a lazy bytestring but > it 'append' wasn't fast enough, so I'm trying this representation.
But you do realise it's exactly the same representation. Append for a lazy bytestring is O(n) in the number of chunks n, this will also be true for your 'new' representation. > Your email makes me think I should work directly with a list of strict > bytestrings, That's exactly what a lazy bytestring is. You'll not get any performance improvements without changing the data representation. A list is not good enough for what you want to do because so many operations are O(n) in the number of chunks. > but in that case what happens when I want to take a large > chunk of strings out of the middle of the list? Would that be an O(n) > operation? Yes. That's exactly the problem. What you want rather than a list of strict bytestrings is a tree of strict bytestrings. You want a fingertree of strict bytestrings: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fingertree newtype ByteSequence = BS (FingerTree (Sum Int) Strict.ByteString) instance Measured (Sum Int) Strict.ByteString where measure = Sum . Strict.length You'll have to wrap the operations you need, (like split, take, drop and append) to make the ByteSequence look like a single sequence of bytes rather than a sequence of chunks. You probably want to enforce an invariant that no chunk is ever empty (as we do for lazy bytestrings). For best performance over a large number of inserts and deletes you might need to implement merging adjacent small blocks so that the block size does not degrade too much. An alternative structure if you tend to do lots of inserts and deletes at near the same spot is a zipper structure with a cursor. I'm not so sure what the best structure for that might be, perhaps just a pair of finger trees giving the parts of the sequence before and after the insertion point (since finger trees give fast access to the ends but slower O(log n) access to points n chunks from the closer end). Have fun :-) I should point out that other people who did this year's ICFP contest have also looked at structures like this (though mostly after the contest finished), so you might want to talk or collaborate with them. Duncan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe