Re: [GHC] #7429: Unexplained performance boost with +RTS -h

GHC Mon, 19 Nov 2012 20:18:54 -0800

#7429: Unexplained performance boost with +RTS -h
---------------------------------+------------------------------------------
    Reporter:  simonmar          |       Owner:                         
        Type:  bug               |      Status:  new                    
    Priority:  normal            |   Milestone:  7.8.1                  
   Component:  Compiler          |     Version:  7.6.1                  
    Keywords:                    |          Os:  Unknown/Multiple       
Architecture:  Unknown/Multiple  |     Failure:  Runtime performance bug
  Difficulty:  Unknown           |    Testcase:                         
   Blockedby:                    |    Blocking:                         
     Related:                    |  
---------------------------------+------------------------------------------


Comment(by tomberek):

 Replying to [ticket:7429 simonmar]:


 {{{
 {-# LANGUAGE
         RankNTypes
         ,RecursiveDo
         ,TypeFamilies
         ,GeneralizedNewtypeDeriving
   #-}

 import System.TimeIt
 import Control.Monad.State.Strict
 import Control.Monad.ST.Strict
 import Data.STRef

 -- STO Monad is just StateT with an Int stacked on top of a ST monad.
 newtype STO s a = STO { unSTO :: StateT Int (ST s) a }
     deriving (Functor, Monad)

 runSTO :: (forall s. STO s a) -> a
 runSTO x = runST (evalStateT (unSTO x) 0)

 data CircList s = ConsM {value :: {-# UNPACK #-} !Int
                         ,cons  :: {-# UNPACK #-} !(STRef s (CircList s ))
 }

 -- | Defines a circular list of length 2
 twoItemList :: forall s. ST s (CircList s)
 twoItemList = do
     rec item1 <- liftM (ConsM 0) $ newSTRef item2
         item2 <- liftM (ConsM 1) $ newSTRef item1
     return item1

 -- | I tie a circular list of size 2 and step through it n times.
 main :: IO ()
 main = do
     let n = 633222111 :: Int
     let takeOneStep = switchSTorSTO . readSTRef . cons

     runLine $ switchSTorSTO twoItemList  >>= liftM value . iterateM n
 takeOneStep

 --switchSTorSTO = id
 --runLine = timeIt . print . runST
 --  ****************************  TO SWITCH TO ST: switch between the two
 sets of definitions above and below this line
 switchSTorSTO  = STO . lift
 runLine = timeIt . print . runSTO

 -- I find myself writing this function a lot, did I miss some basic Monad
 helper?
 iterateM :: (Monad m) => Int -> (b -> m b) -> b -> m b
 iterateM n f c = go n c where
     go 0 b = return b
     go ns b = f b >>= go (ns-1)

 }}}

 Here is a simpler version (also attached as Main.hs) that uses a circular
 list of length two. The odd thing is that this runs around 3 seconds when
 I only use the ST monad, or the STO without profiling. But the STO with
 profiling makes it FASTER to 2.6 seconds. The two versions seem to have
 the same GC stats as before. So then why not just use profiling ALL the
 time? I don't know if this difference exists for large 'project sized'
 uses of ST, but it seems a monad transformer on top of it along with
 profiling helps it go faster.

 The STO version produces this core, which looks pretty good to me:

 {{{
 Rec {
 Main.$wa [Occ=LoopBreaker]
   :: forall s_XF2.
      GHC.Prim.Int#
      -> Main.CircList s_XF2
      -> GHC.Types.Int
      -> GHC.Prim.State# s_XF2
      -> (# GHC.Prim.State# s_XF2,
            (Main.CircList s_XF2, GHC.Types.Int) #)
 [GblId, Arity=4, Caf=NoCafRefs, Str=DmdType LLLL]
 Main.$wa =
   \ (@ s_XF2)
     (ww_s1na :: GHC.Prim.Int#)
     (w_s1nc :: Main.CircList s_XF2)
     (w1_s1nd :: GHC.Types.Int)
     (w2_s1ne :: GHC.Prim.State# s_XF2) ->
     case ww_s1na of ds_XFh {
       __DEFAULT ->
         case w_s1nc of _ { Main.ConsM rb_dFC rb1_dFD ->
         case GHC.Prim.readMutVar#
                @ s_XF2 @ (Main.CircList s_XF2) rb1_dFD w2_s1ne
         of _ { (# ipv_aJ9, ipv1_aJa #) ->
         Main.$wa @ s_XF2 (GHC.Prim.-# ds_XFh 1) ipv1_aJa w1_s1nd ipv_aJ9
         }
         };
       0 -> (# w2_s1ne, (w_s1nc, w1_s1nd) #)
     }
 end Rec }
 }}}

 The effect disappears when you switch to using just ST, but the core looks
 almost identical other than the threading of the original Int in the
 StateT. Here is the core produced for the ST version:

 {{{
 Rec {
 Main.$wpoly_a [Occ=LoopBreaker]
   :: forall s_aDw.
      GHC.Prim.Int#
      -> Main.CircList s_aDw
      -> GHC.Prim.State# s_aDw
      -> (# GHC.Prim.State# s_aDw, Main.CircList s_aDw #)
 [GblId, Arity=3, Caf=NoCafRefs, Str=DmdType LLL]
 Main.$wpoly_a =
   \ (@ s_aDw)
     (ww_s1la :: GHC.Prim.Int#)
     (w_s1lc :: Main.CircList s_aDw)
     (w1_s1ld :: GHC.Prim.State# s_aDw) ->
     case ww_s1la of ds_XFl {
       __DEFAULT ->
         case w_s1lc of _ { Main.ConsM rb_dFI rb1_dFJ ->
         case GHC.Prim.readMutVar#
                @ s_aDw @ (Main.CircList s_aDw) rb1_dFJ w1_s1ld
         of _ { (# ipv_aIP, ipv1_aIQ #) ->
         Main.$wpoly_a @ s_aDw (GHC.Prim.-# ds_XFl 1) ipv1_aIQ ipv_aIP
         }
         };
       0 -> (# w1_s1ld, w_s1lc #)
     }
 end Rec }
 }}}

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7429#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Re: [GHC] #7429: Unexplained performance boost with +RTS -h

Reply via email to