The $f2 comes from the instance Monad (IterateeGM ...). print_lines uses a specialised version of that instance, namely Monad (IterateeGM el IO) The fact that print_lines uses it makes GHC generate a specialised version of the instance decl.
Even in the absence of print_lines you can generate the specialised instance thus instance Monad m => Monad (IterateeGM el m) where {-# SPECIALISE instance Monad (IterateeGM el IO) #-} ... methods... does that help? Simon | -----Original Message----- | From: John Lato [mailto:[EMAIL PROTECTED] | Sent: 28 November 2008 12:07 | To: Simon Peyton-Jones | Cc: Neil Mitchell; glasgow-haskell-users@haskell.org; Don Stewart | Subject: Re: cross module optimization issues | | Neil, thank you very much for taking the time to look at this; I | greatly appreciate it. | | One thing I don't understand is why the specializations are caused by | print_lines. I suppose the optimizer can infer something which it | couldn't otherwise. | | If I read this properly, the functions being specialized are liftI, | (>>=), return, and $f2. One thing I'm not sure about is when INLINE | provides the desired optimal behavior, as opposed to SPECIALIZE. The | monad functions are defined in the Monad instance, and thus aren't | currently INLINE'd or SPECIALIZE'd. However, if they are separate | functions, would INLINE be sufficient? Would that give the optimizer | enough to work with the derive the specializations on its own? I'll | have some time to experiment with this myself tomorrow, but I'd | appreciate some direction (rather than guessing blindly). | | What is "$f2"? I've seen that appear before, but I'm not sure where | it comes from. | | Thanks, | John | | On Fri, Nov 28, 2008 at 10:31 AM, Simon Peyton-Jones | <[EMAIL PROTECTED]> wrote: | > The specialisations are indeed caused (indirectly) by the presence of print_lines. If | print_lines is dead code (as it is when print_lines is not exported), then there are no calls | to the overloaded functions at these specialised types, and so you don't get the specialised | versions. You can get specialised versions by a SPECIALISE pragma, or SPECIALISE INSTANCE | > | > Does that make sense? | > | > Simon | > | > | -----Original Message----- | > | From: Neil Mitchell [mailto:[EMAIL PROTECTED] | > | Sent: 28 November 2008 09:48 | > | To: Simon Peyton-Jones | > | Cc: John Lato; glasgow-haskell-users@haskell.org; Don Stewart | > | Subject: Re: cross module optimization issues | > | | > | Hi | > | | > | I've talked to John a bit, and discussed test cases etc. I've tracked | > | this down a little way. | > | | > | Given the attached file, compiling witih SHORT_EXPORT_LIST makes the | > | code go _slower_. By exporting the "print_lines" function the code | > | doubles in speed. This runs against everything I was expecting, and | > | that Simon has described. | > | | > | Taking a look at the .hi files for the two alternatives, there are two | > | differences: | > | | > | 1) In the faster .hi file, the body of print_lines is exported. This | > | is reasonable and expected. | > | | > | 2) In the faster .hi file, there are additional specialisations, which | > | seemingly have little/nothing to do with print_lines, but are omitted | > | if it is not exported: | > | | > | "SPEC >>= [GHC.IOBase.IO]" ALWAYS forall @ el | > | $dMonad :: GHC.Base.Monad GHC.IOBase.IO | > | Sound.IterateeM.>>= @ GHC.IOBase.IO @ el $dMonad | > | = Sound.IterateeM.a | > | `cast` | > | (forall el1 a b. | > | Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO a | > | -> (a -> Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO b) | > | -> trans | > | (sym ((GHC.IOBase.:CoIO) | > | (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO b))) | > | (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO b))) | > | @ el | > | "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el | > | $dMonad :: | > | GHC.Base.Monad GHC.IOBase.IO | > | Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad | > | = Sound.IterateeM.$s$f2 @ el | > | "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el | > | $dMonad :: | > | GHC.Base.Monad GHC.IOBase.IO | > | Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad | > | = Sound.IterateeM.$s$f21 @ el | > | "SPEC Sound.IterateeM.liftI [GHC.IOBase.IO]" ALWAYS forall @ el | > | @ a | > | $dMonad :: | > | GHC.Base.Monad GHC.IOBase.IO | > | Sound.IterateeM.liftI @ GHC.IOBase.IO @ el @ a $dMonad | > | = Sound.IterateeM.$sliftI @ el @ a | > | "SPEC return [GHC.IOBase.IO]" ALWAYS forall @ el | > | $dMonad :: GHC.Base.Monad | > | GHC.IOBase.IO | > | Sound.IterateeM.return @ GHC.IOBase.IO @ el $dMonad | > | = Sound.IterateeM.a7 | > | `cast` | > | (forall el1 a. | > | a | > | -> trans | > | (sym ((GHC.IOBase.:CoIO) | > | (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO a))) | > | (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO a))) | > | @ el | > | | > | My guess is that these cause the slowdown - but is there any reason | > | that print_lines not being exported should cause them to be omitted? | > | | > | All these tests were run on GHC 6.10.1 with -O2. | > | | > | Thanks | > | | > | Neil | > | | > | | > | On Fri, Nov 21, 2008 at 10:33 AM, Simon Peyton-Jones | > | <[EMAIL PROTECTED]> wrote: | > | > | This project is based on Oleg's Iteratee code; I started using his | > | > | IterateeM.hs and Enumerator.hs files and added my own stuff to | > | > | Enumerator.hs (thanks Oleg, great work as always). When I started | > | > | cleaning up by moving my functions from Enumerator.hs to MyEnum.hs, my | > | > | minimal test case increased from 19s to 43s. | > | > | | > | > | I've found two factors that contributed. When I was cleaning up, I | > | > | also removed a bunch of unused functions from IterateeM.hs (some of | > | > | the test functions and functions specific to his running example of | > | > | HTTP encoding). When I added those functions back in, and added | > | > | INLINE pragmas to the exported functions in MyEnum.hs, I got the | > | > | performance back. | > | > | | > | > | In general I hadn't added export lists to the modules yet, so all | > | > | functions should have been exported. | > | > | > | > I'm totally snowed under with backlog from my recent absence, so I can't look at this | > | myself, but if anyone else wants to I'd be happy to support with advice and suggestions. | > | > | > | > In general, having an explicit export list is good for performance. I typed an extra | section | > | in the GHC performance resource http://haskell.org/haskellwiki/Performance/GHC to explain | why. | > | In general that page is where we should document user advice for performance in GHC. | > | > | > | > I can't explain why *adding* unused functions would change performance though! | > | > | > | > Simon | > | > | > | > | > | > _______________________________________________ | > | > Glasgow-haskell-users mailing list | > | > Glasgow-haskell-users@haskell.org | > | > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users | > | > | > _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users