fmohamed: > I had posted some data on inter-module optimizations that I had > calculated when splitting my program from one computational module to > many different ones. > > Tim Chevalier suggested that my calculation could be interesting to the > people here. > > So I made the effort of preparing the various versions of my code and re > doing the analysis better. > Unfortunately I had already began renaming things without doing a darcs > record, so in the split version some function names are different. > > I have a tar.bz archive of 21KB, but I did not know if it is considered > rude to send attachments, but if someone is interested I can send him > the file. > > Basically it mainly boils down to non-inlining of some important > functions on a newtype ( > type LatLocI = Word32 > newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord) > ), because specialization should not be an issue as I had already given > specific signatures to my functions. > > Also worth noting is that using the profiling with -O2 compilation makes > one thing that inlining (or using a single module) makes the program > slower, whereas the opposite is true. I think that the profiling > overhead are incorrectly evaluated. > I know that with -O2 one cannot expect profiling to be good, but it > would be nice if it wouldn't be so misleading > > Here some data (obtained with a script that is also in the tar.bz archive) > > ******** allInOne: > original program, monolithic main computational module > * timings of -O2 executable > 7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+894minor)pagefaults 0swaps > * timings of the executable with profiling > total time = 15.25 secs (305 ticks @ 50 ms) > total alloc = 5,888,786,120 bytes (excludes profiling overheads) > ******** splitModule NoReexport NoInline directives: > split computational module, no export list for split modules > * timings of -O2 executable > 10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+901minor)pagefaults 0swaps > * timings of the executable with profiling > total time = 11.85 secs (237 ticks @ 50 ms) > total alloc = 5,888,780,912 bytes (excludes profiling overheads) > ******** splitModule Reexport NoInline directives: > computational module, no export list for split modules, old module > reexport using export list > * timings of -O2 executable > 8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+901minor)pagefaults 0swaps > * timings of the executable with profiling > total time = 12.20 secs (244 ticks @ 50 ms) > total alloc = 5,888,780,912 bytes (excludes profiling overheads) > ******** splitModule NoReexport Inline directives: > split computational module, no export list for split modules, explicit > inline directives > * timings of -O2 executable > 6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+895minor)pagefaults 0swaps > * timings of the executable with profiling > total time = 18.80 secs (376 ticks @ 50 ms) > total alloc = 5,374,883,312 bytes (excludes profiling overheads) > ************* > > Fawzi
To really understand what is going on, I suggest looking at the -ddump-simpl output as you change the inlining settings. Then you'll see how GHC is moving code about. -- Don (who's spent the last 2 weeks playing the simplifer/inliner game) _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users