#2466: Execution time mis-accounting
---------------------------+------------------------------------------------
  Reporter:  heatsink      |          Owner:        
      Type:  bug           |         Status:  closed
  Priority:  high          |      Milestone:  7.4.1 
 Component:  Profiling     |        Version:  7.3   
Resolution:  fixed         |       Keywords:        
  Testcase:                |      Blockedby:        
Difficulty:  Unknown       |             Os:  Linux 
  Blocking:                |   Architecture:  x86   
   Failure:  None/Unknown  |  
---------------------------+------------------------------------------------
Changes (by simonmar):

  * status:  new => closed
  * resolution:  => fixed


Comment:

 Thanks for updating the test case.  I've looked into this today, and I
 think the profile is correct.

 With HEAD I see this:

 {{{
  CAF                     Main                    127           0    0.0
 0.0    98.2   99.9
   icosahedron            Main                    256           0    0.0
 0.0     0.0    0.0
   scene                  Main                    253           1    0.0
 0.0     0.0    0.0
    icosahedron           Main                    254           2    0.0
 0.0     0.0    0.0
     triangle             Triangle                255          40    0.0
 0.0     0.0    0.0
   pgm                    Main                    251           0    0.0
 0.0     0.0    0.0
   viewport               Main                    245           1    0.0
 0.0     0.0    0.0
   main                   Main                    242           1    0.0
 0.0    98.2   99.9
    render                Camera                  247           1    8.4
 19.5    91.0   85.6
     raytrace             Scene                   252      102400   37.7
 60.9    82.6   66.1
      lambertian          Shade                   260       37215    0.0
 0.0     3.6    3.7
       sumForAllLights    Shade                   261       37215    3.6
 2.3     3.6    3.7
        illumination      Shade                   262      111645    0.0
 1.3     0.0    1.3
      shade               Datatypes               259       37215    0.0
 0.0     0.0    0.0
      intersect           Datatypes               257     4096000   21.6
 0.0    41.3    1.5
       triangle.intersect Triangle                258     4096000   19.8
 1.5    19.8    1.5
     qtrApply             VectorMath              248           3    0.0
 0.0     0.0    0.0
      negateQtr           VectorMath              250           3    0.0
 0.0     0.0    0.0
      qtrComp             VectorMath              249           6    0.0
 0.0     0.0    0.0
    pgm                   Main                    244           1    7.2
 14.3     7.2   14.3
 }}}

 So we see the same number of calls to `intersect` as to the inner function
 `triangle.intersect`, but the time is split about 50/50 between them.  The
 outer `intersect` consists of two cases and a function call; but I suspect
 the crucial factor is that the inner call is a call to an unknown
 function, so it has to go through the generic call machinery in the RTS.
 It might also be calling a PAP (partial application), which is more
 costly.  If we do a low-level profile with `perf` (see
 [wiki:Debugging/LowLevelProfiling/Perf]), we see:

 {{{
     11.71%          htrace  ./dist/build/htrace/htrace  [.] s33w_info
     10.94%          htrace  ./dist/build/htrace/htrace  [.]
 stg_upd_frame_info
      8.36%          htrace  ./dist/build/htrace/htrace  [.] pushCostCentre
      6.06%          htrace  ./dist/build/htrace/htrace  [.] stg_PAP_apply
      4.89%          htrace  ./dist/build/htrace/htrace  [.]
 0x000000005f99c9
      4.10%          htrace  ./dist/build/htrace/htrace  [.] sCb_info
      3.82%          htrace  ./dist/build/htrace/htrace  [.] stg_ap_pp_fast
      3.76%          htrace  ./dist/build/htrace/htrace  [.]
 0x0000000001f8c2
      3.36%          htrace  ./dist/build/htrace/htrace  [.] s33x_info
      3.12%          htrace  ./dist/build/htrace/htrace  [.] sJT_info
      2.00%          htrace  ./dist/build/htrace/htrace  [.]
 enterFunCurShorter
      1.98%          htrace  ./dist/build/htrace/htrace  [.]
 overwritingClosure
 }}}

 There is a lot of time being spent in `stg_PAP_apply` and
 `stg_ap_pp_fast`, these are the RTS functions for applying a partial
 application and applying an unknown function to two arguments
 respectively.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/2466#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to