Hi all, I have a use case where I need to chunk a stream of values, i.e. turn a `Producer a m ()` to a `Producer (Chunk a) m ()` where `Chunk` is something like an in-memory vector.
This is a small complete example of what I was trying to do: import Control.Lens (view) import Control.Monad import qualified Data.List as L import Data.Vector (Vector) import qualified Data.Vector as V import qualified Data.Vector.Mutable as VM import Pipes import Pipes.Group main = let nums = each $ L.replicate 1000000 (1 :: Int) chunks = chunkify 4096 nums in runEffect $ chunks >-> inspectChunk where -- eval each element in a chunk to whnf inspectChunk = forever $ do Chunk v <- await V.mapM_ (($!) const (return ())) v newtype Chunk a = Chunk { chunkVec :: Vector a } chunkify :: Int -> Producer a IO () -> Producer (Chunk a) IO () chunkify size p = let chunks = {-# SCC view_chunksOf #-} view (chunksOf size) p in foldsM addToChunk (newChunk size) freezeChunk chunks newChunk size = VM.new size >>= return . (0,) freezeChunk (i, v) = V.unsafeFreeze (VM.slice 0 i v) >>= return . Chunk addToChunk (i, v) x = do VM.write v i x return (i + 1, v) This works, but is very inefficient because it spends the majority of time chunking. In an ideal world, I would expect chunking the input to have minimal cost, compared to the actual computation (in this case, `inspectChunk`). Here is a profile of the above: COST CENTRE MODULE %time %alloc view_chunksOf Main 35.4 39.6 chunkify Main 23.3 19.2 main.inspectChunk Main 19.1 25.6 addToChunk Main 16.6 10.2 main.nums Main 5.3 5.1 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 156 0 0.0 0.0 100.0 100.0 CAF Main 311 0 0.0 0.0 99.7 100.0 main Main 312 1 0.0 0.0 99.7 100.0 main.nums Main 318 1 4.6 5.1 4.6 5.1 main.chunks Main 314 1 0.0 0.0 76.0 69.3 chunkify Main 315 1 23.3 19.2 76.0 69.3 freezeChunk Main 322 245 0.0 0.0 0.0 0.0 addToChunk Main 320 1000000 16.6 10.2 16.6 10.2 newChunk Main 319 1 0.0 0.3 0.0 0.3 chunkify.chunks Main 316 1 0.0 0.0 36.1 39.6 view_chunksOf Main 317 1 35.4 39.6 36.1 39.6 main.nums Main 321 0 0.7 0.0 0.7 0.0 main.inspectChunk Main 313 1 19.1 25.6 19.1 25.6 In the full program I was working on, which does more computation than just inspecting each element and throwing it away, chunking is still a huge bottleneck. I expected chunking to spend some time looking at each element and counting down from the chunkSize (like what `splitAt` does), but didn't expect that to be so expensive. Could I improve this somehow? Maybe I'm approaching this wrong? Background: this is part of a program that splits/merges/etc. streams of inputs, where "stream" could be actual streams or just Data.List lists. Embarrassingly the stream version is much slower than lists =( Thanks, -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscr...@googlegroups.com. To post to this group, send email to haskell-pipes@googlegroups.com.