Re: [haskell-pipes] Multiple mapM and performance

Gabriel Gonzalez Sun, 11 Jun 2017 20:24:04 -0700

I think it's important to distinguish between two separate concepts:
"fusion" vs "one-pass".  "Fusion" refers to avoiding the allocation of
intermediate data structures when you transform a stream multiple times
whereas "one-pass" means that you don't traverse the sequence of elements
twice when you transform the stream multiple times (i.e. you go over the
stream in one pass).  You can have a "one-pass" implementation without
"fusion" but you cannot have "fusion" without a "one-pass" implementation.


"Fusion" is purely an optimization, meaning that whether or not an
implementation uses "fusion" only affects your program's performance but
won't affect its behavior.  However, "one-pass" is not just an
optimization: one-pass versus multiple pass changes the behavior of your
program, especially once your stream has effects like in these streaming
libraries.

Out of the two properties, "one-pass" is *much* more important.  The reason
why is that "one-pass" ensures that certain functor laws hold.  To see why,
let's consider a case where they *don't* hold, which is `Data.List.mapM`.
Normally, you mtigh expect the following functor laws to hold:

    Data.List.mapM (f <=< g) = Data.List.mapM f . Data.List.mapM g

    Data.List.mapM return = id

However, the above two laws don't actually hold for `Data.List.mapM`.  For
example, the first law does not hold because the order of effects are not
the same for the left-hand and right-hand sides of the equation.  The
left-hand side of the equation interleaves the effects of `f` and `g`
whereas the right-hand side runs all of `g`'s effects first followed by all
of `f`'s effects.  The second equation is also wrong because
`Data.List.mapM` misbehaves on infinite lists:

    Data.List.mapM return (repeat x) = _|_

    id (repeat x) = repeat x

This is the root of why `Data.List.mapM` is "bad"

However, the streaming libraries have their own versions of `mapM` which do
obey the above functor laws.  For example, if you take the
`list-transformer` library and define:

    mapM :: (a -> m b) -> ListT m a -> ListT m b
    mapM f as = do
        a <- as
        lift (f a)

... then this *does* obey the following functor laws:

    mapM (f <=< g) = mapM f . mapM g

    mapM return = id

For the first equation, both sides of the equation interleave the effects
of `f` and `g`.  For the second equation, both sides of the equation behave
correctly on infinite `ListT` streams.  These functor laws hold because
`ListT` is has the "one-pass" property.

So to answer your question: it's not exactly the Haskell `Functor` type
class per se that is important here, but the functor laws are important
(for a more general notion of functor) in establishing why a single pass
implementation matters.


On Wed, Jun 7, 2017 at 12:02 PM, aquagnu <aqua...@gmail.com> wrote:

> > They will compile successive transformations (i.e. multiple
> > maps/mapMs/filters, for example) into a single pass over the data.
> > This is true for the
> > `streaming`/`pipes`/`list-transformer`/`logict`/`conduit`/`turtle`
> > libraries or any "ListT-done-right" implementation.  All of these
> > libraries do not rely on any special GHC optimizations or rewrite
> > rules.  They will still go over the data in one pass even if you
> > disable all optimizations.
>
> So, this is so, because of implementation of their Functor and
> Applicative instances? Am I right, that there occurs this combination
> ("single pass")?
>
> But what about Data.List and Prelude's `map` (in GHC.Base)? If I'll
> do
>
>   map fn1 (map fn2 lst)
>
> ?
>
> >
> > This is *not* true for `Control.Monad.mapM` or the `ListT` type from
> > `transformers` (a.k.a. "ListT done wrong"), which is why I wanted to
> > make sure which `mapM` you were talking about.
>
> OK, I got it.
>
>
>
> >
> > > On Jun 7, 2017, at 12:32 AM, aqua...@gmail.com wrote:
> > >
> > > Hello, Gabriel!
> > >
> > > Real module (which is compilable) is big, so I'll show on email's
> > > top extraction from it (little snippets, with imports and
> > > functions) and will add compilable module (I used it for
> > > experiments) at the end of the email - because it's not very
> > > small :) So, compilable code is on the bottom of the mail.
> > >
> > > Snippets:
> > >
> > > ...
> > >
> > > import           Streaming
> > > import qualified Streaming.Prelude as S
> > >
> > > ...
> > >
> > > (|>) = flip ($)
> > >
> > > ...
> > >
> > >     flow = items
> > >            |> S.mapM (liftIO . doRequest connection)
> > >            |> S.map (liftM fun1)
> > >            |> S.zip fun2
> > >            |> S.filter filter1
> > >            |> S.mapM fun3
> > >
> > > ...
> > >
> > > So, I supposed that serial "map"s/"zip"s/"filter"s will be compiled
> > > into one loop with serial application of functions-arguments of
> > > that "map"s/"zip"s/"filter"s. My Python background "says" me that
> > > maps/filters/etc are types, not functions, so they are
> > > "combinatorable": I mean no problem to combine serial flat
> > > iterations into one iteration. I am newbie in Haskell and I don't
> > > know is it true for Haskell... may be it should happens on
> > > optimization phase of compilation, I don't know.
> > >
> > > Does it happen automatically, on GHC side, or developer should take
> > > special actions when works with Streaming/Pipes/Conduits? Is it the
> > > same for usual maps/filters from Prelude/Data.List? I supposed 99%
> > > that GHC reduces ASTs then optimizes the result and we get at the
> > > end flat simple iterations and so on, and developer should not
> > > think about such things totally. But I had a doubt crept in, so I
> > > decided to ask...
> > >
> > > Another example is:
> > >
> > >   where flow = getItems connection
> > >                |> S.filter getInteresting
> > >                |> S.map fun1
> > >                |> fixItems fun2
> > >                |> S.mapM fun3
> > >                |> S.mapM fun4
> > > ...
> > >
> > > where fixItems iterates over stream's items and "yields" them with
> > > S.yield, like this:
> > >
> > > ...
> > >
> > > import qualified Control.Monad.Trans.State  as
> > > ST import qualified Control.Monad.Trans.Writer as W
> > >
> > > ...
> > >
> > > fixItems :: Monad m => FixItem e ps st -> Stream (Of e) (ST.StateT
> > > st m) Result -> Stream (Of e) (ST.StateT st m) Result fixItems fi =
> > > loop where loop str = do
> > >     e <- lift $ S.next str
> > >     e' <- case e of
> > >       Left err -> return err
> > >       Right (e', str') ->
> > >         let (fixes, problems) = W.runWriter $ fixItem fi e'
> > >             fix = mconcat fixes
> > >         in (lift $ ST.modify $ reportProblems fi problems e')
> > >             >> case fix of
> > >                 ItemFix ff -> (S.yield $ ff e')
> > >                 ItemSkip   -> pure ()
> > >             >> loop str'
> > >     return NoResult -- FIXME how to return stream result?
> > > ...
> > >
> > >
> > > Next is compilable module. I experimented with "fixing" of stream's
> > > items; code is terrible, I suppose, but it was experiment ;):
> > >
> > > {-# LANGUAGE FlexibleContexts #-}
> > > module Main where
> > >
> > > import           Control.Monad
> > > import           Control.Monad.Trans.Reader
> > > import           Control.Monad.Trans.State
> > > import           Control.Monad.Trans.Writer
> > > import           Data.Functor.Identity
> > > import           Data.Traversable
> > > import           Streaming
> > > import qualified Streaming.Prelude          as S
> > >
> > >
> > > type M = StateT [String] IO
> > >
> > > subgen :: Int -> S.Stream (S.Of Int) M Int
> > > subgen n =
> > >   if odd n then do { return 0 }
> > >   else do
> > >     S.yield (n*100)
> > >     S.yield (n*1000)
> > >     lift $ modify (++["!!!"])
> > >     return 0
> > >
> > > gen :: S.Stream (S.Of Int) M Int
> > > gen = do
> > >   S.yield 0; S.yield 100; S.yield 101; S.yield 102; S.yield 103;
> > > S.yield 104; S.yield 105; S.yield 106 liftIO $ putStrLn "enter x: "
> > >   x <- liftIO getLine
> > >   let n = read x::Int
> > >   S.yield n
> > >   lift $ modify (++["gen1"])
> > >   lift $ modify (++["gen2"])
> > >   return 0 -- $ do
> > >
> > > proc1 :: S.Stream (S.Of Int) M Int -> S.Stream (S.Of Int) M Int
> > > proc1 str = do
> > >   st <- lift get
> > >   loop str st
> > >   where
> > >     loop str st = do
> > >       e <- lift $ S.next str
> > >       e' <- case e of
> > >         Left err -> return err
> > >         Right (e', str') ->
> > >           (if e' == 100 then (lift $ put $ st ++ ["proc1"]) else
> > > pure ()) >> subgen e' >> (S.yield $ e' + 123) >> loop str' st
> > >       return 1
> > >
> > > data Cr = Cr {
> > >   crName  :: String
> > >   , crAge :: Int
> > >   } deriving Show
> > >
> > > data FixItem a = FixItem (a -> a) | SkipItem
> > > instance Monoid (FixItem a) where
> > >   mempty = FixItem id
> > >   mappend SkipItem _              = SkipItem
> > >   mappend _ SkipItem              = SkipItem
> > >   mappend (FixItem f) (FixItem g) = FixItem (f . g)
> > >
> > > fixWhen :: Monad m => Bool -> m (FixItem a) -> m (FixItem a)
> > > fixWhen cond act = if cond then act else return $ FixItem id
> > >
> > > fixcr1 :: Int -> Writer [String] (FixItem Int)
> > > fixcr1 n =
> > >   let (fixes, errs) = runWriter $ sequence [
> > >         fixWhen (n == 100) (tell ["panic:"++show n++"==100"] >>
> > > return SkipItem) , fixWhen (n > 100) (tell ["err1:"++show n
> > > ++">100"] >> return (FixItem (10+))) , fixWhen (n > 1) (tell
> > > ["err2:"++show n++">1"] >> return (FixItem (100+))) ]
> > >   in
> > >     writer (mconcat fixes, errs)
> > >
> > > fixItems :: (Monad m, Num a) => Stream (Of Int) (StateT [String] m)
> > > a -> Stream (Of Int) (StateT [String] m) a fixItems = loop where
> > >   loop str = do
> > >     e <- lift $ S.next str
> > >     e' <- case e of
> > >       Left err -> return err
> > >       Right (e', str') ->
> > >         let (fix, errs) = runWriter $ fixcr1 e'
> > >         in (lift $ modify (++errs))
> > >             >> case fix of
> > >                 FixItem ff -> (S.yield $ ff e')
> > >                 SkipItem   -> pure ()
> > >             >> loop str'
> > >     return 1
> > >
> > >
> > > (|>) = flip ($)
> > >
> > > main :: IO ()
> > > main = do
> > >   p <- runStateT flow []
> > >   print p
> > >   print "end."
> > >   where flow = gen
> > >           |> fixItems
> > >           |> S.map (+100)
> > >           |> S.mapM_ (liftIO . print)
> > >
> > > and to build it I included in cabal 2 dependencies:
> > > ...
> > >                      , transformers >= 0.5
> > >                      , streaming
> > > ...
> > >
> > >
> > > /Best regards, Paul
> > >
> > >
> > > Could you provide an example that compiles, including imports?  The
> > > reason I ask is that I'm not sure which mapM that you are using
> > >> On Jun 6, 2017, at 1:53 AM, aqu...@ <>gmail.com
> > >> <http://gmail.com/> wrote:
> > >>
> > >> Hello, everyone. Because I'm newbie, my question may seem naive.
> > >> but: I'm doing something like:
> > >> do
> > >>  str' <- S.mapM fun1 str
> > >>  str'' <- S.mapM fun2 str'
> > >>  -- so on
> > >> and I suppose that real iteration (if I use `mapM` or `iterM`, map
> > >> other functions of streaming) happens only one and resulting code
> > >> after compilation will looks like for item in str: item' = fun1
> > >> item item'' = fun2 item'
> > >>  -- so on
> > >> Am I right? Or are there some pitfalls here which we should
> > >> remember to accomplish such result?
> > >>
> > >>
> > >> /Best regards, Paul
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> You received this message because you are subscribed to the Google
> > >> Groups "Haskell Pipes" group. To unsubscribe from this group and
> > >> stop receiving emails from it, send an email to haskell-pipe...@
> > >> <>googlegroups.com <http://googlegroups.com/>. To post to this
> > >> group, send email to haskel...@ <>googlegroups.com
> > >> <http://googlegroups.com/>.
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Haskell Pipes" group. To unsubscribe from this group and
> > > stop receiving emails from it, send an email to
> > > haskell-pipes+unsubscr...@googlegroups.com
> > > <mailto:haskell-pipes+unsubscr...@googlegroups.com>. To post to
> > > this group, send email to haskell-pipes@googlegroups.com
> > > <mailto:haskell-pipes@googlegroups.com>.
> >
>
>
>
> --
> Best regards,
>   Paul a.k.a. 6apcyk
>
> --
> You received this message because you are subscribed to the Google Groups
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to haskell-pipes+unsubscr...@googlegroups.com.
> To post to this group, send email to haskell-pipes@googlegroups.com.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

Re: [haskell-pipes] Multiple mapM and performance

Reply via email to