#7429: Unexplained performance boost with +RTS -h ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: Runtime performance bug Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------
Comment(by tomberek): Replying to [ticket:7429 simonmar]: {{{ {-# LANGUAGE RankNTypes ,RecursiveDo ,TypeFamilies ,GeneralizedNewtypeDeriving #-} import System.TimeIt import Control.Monad.State.Strict import Control.Monad.ST.Strict import Data.STRef -- STO Monad is just StateT with an Int stacked on top of a ST monad. newtype STO s a = STO { unSTO :: StateT Int (ST s) a } deriving (Functor, Monad) runSTO :: (forall s. STO s a) -> a runSTO x = runST (evalStateT (unSTO x) 0) data CircList s = ConsM {value :: {-# UNPACK #-} !Int ,cons :: {-# UNPACK #-} !(STRef s (CircList s )) } -- | Defines a circular list of length 2 twoItemList :: forall s. ST s (CircList s) twoItemList = do rec item1 <- liftM (ConsM 0) $ newSTRef item2 item2 <- liftM (ConsM 1) $ newSTRef item1 return item1 -- | I tie a circular list of size 2 and step through it n times. main :: IO () main = do let n = 633222111 :: Int let takeOneStep = switchSTorSTO . readSTRef . cons runLine $ switchSTorSTO twoItemList >>= liftM value . iterateM n takeOneStep --switchSTorSTO = id --runLine = timeIt . print . runST -- **************************** TO SWITCH TO ST: switch between the two sets of definitions above and below this line switchSTorSTO = STO . lift runLine = timeIt . print . runSTO -- I find myself writing this function a lot, did I miss some basic Monad helper? iterateM :: (Monad m) => Int -> (b -> m b) -> b -> m b iterateM n f c = go n c where go 0 b = return b go ns b = f b >>= go (ns-1) }}} Here is a simpler version (also attached as Main.hs) that uses a circular list of length two. The odd thing is that this runs around 3 seconds when I only use the ST monad, or the STO without profiling. But the STO with profiling makes it FASTER to 2.6 seconds. The two versions seem to have the same GC stats as before. So then why not just use profiling ALL the time? I don't know if this difference exists for large 'project sized' uses of ST, but it seems a monad transformer on top of it along with profiling helps it go faster. The STO version produces this core, which looks pretty good to me: {{{ Rec { Main.$wa [Occ=LoopBreaker] :: forall s_XF2. GHC.Prim.Int# -> Main.CircList s_XF2 -> GHC.Types.Int -> GHC.Prim.State# s_XF2 -> (# GHC.Prim.State# s_XF2, (Main.CircList s_XF2, GHC.Types.Int) #) [GblId, Arity=4, Caf=NoCafRefs, Str=DmdType LLLL] Main.$wa = \ (@ s_XF2) (ww_s1na :: GHC.Prim.Int#) (w_s1nc :: Main.CircList s_XF2) (w1_s1nd :: GHC.Types.Int) (w2_s1ne :: GHC.Prim.State# s_XF2) -> case ww_s1na of ds_XFh { __DEFAULT -> case w_s1nc of _ { Main.ConsM rb_dFC rb1_dFD -> case GHC.Prim.readMutVar# @ s_XF2 @ (Main.CircList s_XF2) rb1_dFD w2_s1ne of _ { (# ipv_aJ9, ipv1_aJa #) -> Main.$wa @ s_XF2 (GHC.Prim.-# ds_XFh 1) ipv1_aJa w1_s1nd ipv_aJ9 } }; 0 -> (# w2_s1ne, (w_s1nc, w1_s1nd) #) } end Rec } }}} The effect disappears when you switch to using just ST, but the core looks almost identical other than the threading of the original Int in the StateT. Here is the core produced for the ST version: {{{ Rec { Main.$wpoly_a [Occ=LoopBreaker] :: forall s_aDw. GHC.Prim.Int# -> Main.CircList s_aDw -> GHC.Prim.State# s_aDw -> (# GHC.Prim.State# s_aDw, Main.CircList s_aDw #) [GblId, Arity=3, Caf=NoCafRefs, Str=DmdType LLL] Main.$wpoly_a = \ (@ s_aDw) (ww_s1la :: GHC.Prim.Int#) (w_s1lc :: Main.CircList s_aDw) (w1_s1ld :: GHC.Prim.State# s_aDw) -> case ww_s1la of ds_XFl { __DEFAULT -> case w_s1lc of _ { Main.ConsM rb_dFI rb1_dFJ -> case GHC.Prim.readMutVar# @ s_aDw @ (Main.CircList s_aDw) rb1_dFJ w1_s1ld of _ { (# ipv_aIP, ipv1_aIQ #) -> Main.$wpoly_a @ s_aDw (GHC.Prim.-# ds_XFl 1) ipv1_aIQ ipv_aIP } }; 0 -> (# w1_s1ld, w_s1lc #) } end Rec } }}} -- Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7429#comment:2> GHC <http://www.haskell.org/ghc/> The Glasgow Haskell Compiler _______________________________________________ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs