I think what's happening here is that the rewrite rule SPECIALIZE is producing turns the call to swap16 into a call to the specialized verision before any inlining happens[1], and then the specialized version can't be inlined[2].

Manually writing out a copy of swap16 at type Word16 -> Word16, marking it INLINE and setting up a rewite rule for specialization gives you the more efficient code:

changing the definition to swap16 to this:

swap16 :: Bits a => a -> a
swap16 v = (v `shiftR` 8) + ((v .&. 0xFF) `shiftL` 8)

{- # RULES "swap16/Word16" forall (v :: Int16) . swap16 v = swap16Word16 v #-}
{- # INLINE swap16Word16 #-}

swap16Word16 :: Word16 -> Word16
swap16Word16 v = (v `shiftR` 8) + ((v .&. 0xFF) `shiftL` 8)

gives this Core for foo:
Main.foo :: GHC.Word.Word16
[GlobalId]
[Str: DmdType]
Main.foo = case GHC.Word.Word16 GHC.Word.$wshift3 __word 3855 (-8)
           of ww1 { __DEFAULT ->
case GHC.Word.Word16 GHC.Word.$wshift3 __word 15 8 of ww { __DEFAULT -> GHC.Word.W16# (GHC.Prim.narrow16Word# (GHC.Prim.plusWord# ww1 ww))
           }
           }

and for swapWord16:

Swap.swap16Word16 :: GHC.Word.Word16 -> GHC.Word.Word16
[GlobalId]
[Arity 1
 NoCafRefs
 Str: DmdType U(L)m]
Swap.swap16Word16 = __inline_me (\ (v :: GHC.Word.Word16) ->
                                   GHC.Word.+2
(GHC.Word.$dmshiftR2 v (GHC.Base.I# 8)) (GHC.Word.shift2 (GHC.Word..&.2 v Swap.lit) (GHC.Base.I# 8)))


Removing the INLINE directive for swap16Word16 leaves this code for foo, with a call to the worker function for swap16Word16:

Main.foo :: GHC.Word.Word16
[GlobalId]
[Str: DmdType]
Main.foo = case GHC.Word.Word16 Swap.$wswap16Word16 __word 3855
           of ww1 { __DEFAULT ->
           GHC.Word.W16# ww1
           }

and these definitions for swap16Word16:

Swap.$wswap16Word16 :: GHC.Prim.Word# -> GHC.Prim.Word#
[GlobalId]
[Arity 1
 NoCafRefs
 Str: DmdType L]
Swap.$wswap16Word16 = \ (ww :: GHC.Prim.Word#) ->
case GHC.Prim.Word# GHC.Word.$wshift3 ww (-8) of ww1 { __DEFAULT -> case GHC.Prim.Word# GHC.Word.$wshift3 (GHC.Prim.and# ww __word 255) 8
                        of ww11 { __DEFAULT ->
GHC.Prim.narrow16Word# (GHC.Prim.plusWord# ww1 ww11)
                        }
                        }

Swap.swap16Word16 :: GHC.Word.Word16 -> GHC.Word.Word16
[GlobalId]
[Arity 1
 Worker Swap.$wswap16Word16
 NoCafRefs
 Str: DmdType U(L)m]
Swap.swap16Word16 = __inline_me (\ (w :: GHC.Word.Word16) ->
case GHC.Word.Word16 w of w1 { GHC.Word.W16# ww -> case GHC.Word.Word16 Swap.$wswap16Word16 ww of ww1 { __DEFAULT ->
                                   GHC.Word.W16# ww1
                                   }
                                   })

It looks like marking the function INLINE supresses the worker/wrapper split, and when there is not INLINE and the function get split, only the wrapper is eligible for inlining.

Brandon Moore

[1] (section 10.7.2 of the 6.4.1 User's guide says "In the earlier phases of compilation, GHC inlines nothing that appears on the LHS of a rule, because once you have substituted for something you can't match against it (given the simple minded matching).")

[2] Maybe because the worker isn't annotation with __inline_me in the Core? Maybe from the aside in 7.10.5 about and INLINE pragma preventing any inlining in the RHS of the thing marked inline?
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to