Re: [haskell-pipes] Re: Parallelizing fold of Producer

Gabriel Gonzalez Sun, 08 Nov 2015 08:05:28 -0800

Let's decompose this into solving two smaller problems:


* How to process the group in blocks of 100
* How to parallelize each block

The type signature of the first step would be a function of type:

groupAndFold :: Monad m => Producer Block m r -> ProducerBlockState m r

And this would use the `pipes-group` library to partition the streaminto groups of 100 elements and then fold each group.

Notice that we haven't actually evaluated anything by doing thatpartition-and-fold step. Our final `Producer BlockState m r` is still aproducer of unevaluated thunks. So that leads us to the second step,which is how to parallelize each block. The type of that would besomething like:


    parallelize :: Monad m => Producer a m r -> Producer a m r

Now we have a smaller and clearer problem to solve: how do we take anarbitrary `Producer` that is yielding large unevaluated thunks andspeculatively evaluate them ahead of time so that they are ready whenyou finally need them.

This is actually difficult to do, though, because you can't even *begin*to evaluate the Nth element that the `Producer` yield without triggeringall side effects preceding that element in the `Producer`.


Let's take a simple example to illustrate the problem:

    example :: Producer Int IO ()
    example = do
        yield someExpensiveComputation1
        str1 <- lift getLine
        yield (someExpensiveComputationThatDependsOn str1)
        str2 <- lift getLine
        yield (anotherExpensiveComputationThatDependsOn str2)

The issue is that we can't even begin to evaluate the`anotherExpensiveComputationThatDependsOn str2` until we know the valueof `str2`, but that requires forcing all effects leading up to thesecond `getLine` command. So the only way we can speculatively evaluatea `Producer` is to force the entire producer or at least force largechunks of the `Producer` at a time (i.e. force 20 `BlockState`s worth ofcomputation at a time).


So let's refine the type of our `parallelize` function:

    parallelize :: Monad m => Int -> Producer a m r -> Producer a m r

The first argument will be how many elements to materialize and thenspeculatively compute at one time. This will materialize the `Producer`in chunks of the given size, spark off their evaluation and thenre-yield them.


    import Control.Foldl (list, purely)
    import Control.Parallel (par)
    import Lens.Family.State.Strict (zoom)
    import Pipes (Producer, each, lift)
    import Pipes.Parse (foldAll, runStateT, splitAt)
    import Prelude hiding (splitAt)

    parallelize :: Monad m => Int -> Producer a m r -> Producer a m r
    parallelize n p = do
        let parser = zoom (splitAt 10) (purely foldAll list)
        (as, p') <- lift (runStateT parser p)
        as `par` (each as >> parallelize n p')

I haven't yet tested that the above code works; I only verified that ittype-checks. However, that is probably close to the best you will beable to do using `pipes`.


On 11/7/2015 11:24 AM, Rune Kjær Svendsen wrote:

Yes. I see that I was less clear in my original message than I couldhave been. The slow/fast function isn't really relevant. Allow me tostart over :)
I have a function:

processBlock :: BlockState -> Block -> BlockState
which folds a Block into a BlockState which has been accumulated fromBlocks previous to the Block in question.
As such, this function only allows sequential operation on alist/stream of blocks.
I also have a function which combines two BlockStates into one:

consolidateBlockState :: BlockState -> BlockState -> BlockState
where the first BlockState is accumulated from Blocks prior to theBlocks from which the latter BlockState is made of.
This allows me to turn the otherwise sequential operation into aparallel one. I just can't figure out how to get "par" and "pseq"working inside a Pipe, in order to fold multiple sets of Blocks intoBlockStates in parallel (on multiple CPU cores).
/Rune
On 07 Nov 2015, at 19:49, Michael Thompson<practical.wis...@gmail.com <mailto:practical.wis...@gmail.com>> wrote:
Right, I think I misunderstood the original message as saying thatthere was a slow function
     Block -> BlockState
and that one could fold the BlockStates up monoidally. But I guessthe slow function is
    BlockState -> Block -> BlockState


--
You received this message because you are subscribed to a topic inthe Google Groups "Haskell Pipes" group.To unsubscribe from this topic, visithttps://groups.google.com/d/topic/haskell-pipes/FItX8aZ588g/unsubscribe.To unsubscribe from this group and all its topics, send an email tohaskell-pipes+unsubscr...@googlegroups.com<mailto:haskell-pipes+unsubscr...@googlegroups.com>.To post to this group, send email to haskell-pipes@googlegroups.com<mailto:haskell-pipes@googlegroups.com>.
--
You received this message because you are subscribed to the GoogleGroups "Haskell Pipes" group.To unsubscribe from this group and stop receiving emails from it, sendan email to haskell-pipes+unsubscr...@googlegroups.com<mailto:haskell-pipes+unsubscr...@googlegroups.com>.To post to this group, send email to haskell-pipes@googlegroups.com<mailto:haskell-pipes@googlegroups.com>.


--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

Re: [haskell-pipes] Re: Parallelizing fold of Producer

Reply via email to