Re: [haskell-pipes] Re: Efficient chunking

Michael Thompson Thu, 06 Aug 2015 10:32:22 -0700

The speed up from my variant `chunksOf` (or maybe mostly `splitAt`) was 
just from 
avoiding using `next` for each Int in the producer. In order to preserve 
correctness,
`next` wraps each step in the *base* monad (here IO); the user then 
'unwraps'
with


     e <- next p
     case e of Left r -> ... r ...
                    Right (n,p') -> ... n .. p' ...

This is how we correctly inspect a producer . If we brazenly break into 
world of 
the hidden constructors, we can replace this with

    case p of 
       I.Pure r -> ... r ...
       I.Respond n f -> ... n ... f () ...
       I.M m   -> ....
  
which evades a little indirection. I.e. if we look directly at the 
constructors of the `Proxy X () () Int IO` monad, we can evade 
some of the wrapping and unwrapping in the base monad (IO) 
the `next` imposes. 

Even within the pipes package Gabriel avoids this outside `Pipes.Internal`. 
 
As a practice this sort of pattern-matching would be depraved since
e.g. you could write 

    isSecretlyPure :: Producer a m r -> Bool
    isSecretlyPure (I.M _) = False
    isSecretlyPure _ = True

But the user is not supposed to see anything that would
for example distinguish 

    return r  
       ~
    I.Pure r 
       ~
    I.M (return (I.Pure r))

and the like. The monad instance is only
correct if these terms are indistinguishable, and 
similarly for more complicated cases.

The complexity introduced by FreeT in this case 
is inversely proportional to the length of the vectors
you are writing. Or am I wrong. The use of `next`, 
(via `view_chunksOf`) affects each Int in the initial producer. 
Each use of `view_chunksOf` is covering n uses of `next`
where n is the length of the vectors you are making.

Gabriel will rightly be furious but you will detect a small improvement 
if you pattern match directly in the one case where you use next in the
way `splitAt` does

    addToChunk size ref  =  loop where 
      loop p =  case p of 
        I.Pure r       -> return (return r)
        I.Request v _  -> I.closed v
        I.M         m  -> m >>= addToChunk size ref
        I.Respond a f  -> do (i, chunk) <- readIORef ref
                             VM.unsafeWrite chunk i a
                             writeIORef ref (i+1, chunk)
                             if   i + 1 >= size
                             then return (f ())
                             else loop (f ())

but the improvement isn't worth the depravity; the main problem is that the 
initial 
wrapping of these myriad little Ints in the complicated Proxy constructors. 
All of the
above considerations just pertain to constant factors at each Int (next) 
and at each
vector break (FreeT).  The kind of factor in question is characteristic of 
pipes. 

It occurs to me you might take a look at 

  
 https://www.fpcomplete.com/user/snoyberg/library-documentation/vectorbuilder
  
 https://www.fpcomplete.com/blog/2014/07/vectorbuilder-packed-conduit-yielding
   https://github.com/nilcons/gists/blob/master/vectorBuilder/README.md

I didn't have the patience to figure out what was going on, but looking at 
it now, I 
wonder if it would not be easier to do in pipes where you would not be bent 
on writing a

     chunkVec :: Int -> Pipe a (Vector a) m r

but just

    chunkVec :: Int -> Producer a m r -> Producer (Vector a) m r


One subtlety is that Conduit is already using a codensity transformation 
internally

    newtype ConduitM i o m r = ConduitM
         { unConduitM :: forall b.
                        (r -> Pipe i i o () m b) -> Pipe i i o () m b
         }

and the construction makes use of this. It might be that the main 
difficulty would
be to write something with a type like

        chunkVec :: Int -> Codensity (Producer a m) r -> Codensity 
(Producer (Vector a) m) r

But I just got confused again looking at it, so I don't know ...



-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

Re: [haskell-pipes] Re: Efficient chunking

Reply via email to