#2466: Execution time mis-accounting
---------------------------+------------------------------------------------
Reporter: heatsink | Owner:
Type: bug | Status: closed
Priority: high | Milestone: 7.4.1
Component: Profiling | Version: 7.3
Resolution: fixed | Keywords:
Testcase: | Blockedby:
Difficulty: Unknown | Os: Linux
Blocking: | Architecture: x86
Failure: None/Unknown |
---------------------------+------------------------------------------------
Changes (by simonmar):
* status: new => closed
* resolution: => fixed
Comment:
Thanks for updating the test case. I've looked into this today, and I
think the profile is correct.
With HEAD I see this:
{{{
CAF Main 127 0 0.0
0.0 98.2 99.9
icosahedron Main 256 0 0.0
0.0 0.0 0.0
scene Main 253 1 0.0
0.0 0.0 0.0
icosahedron Main 254 2 0.0
0.0 0.0 0.0
triangle Triangle 255 40 0.0
0.0 0.0 0.0
pgm Main 251 0 0.0
0.0 0.0 0.0
viewport Main 245 1 0.0
0.0 0.0 0.0
main Main 242 1 0.0
0.0 98.2 99.9
render Camera 247 1 8.4
19.5 91.0 85.6
raytrace Scene 252 102400 37.7
60.9 82.6 66.1
lambertian Shade 260 37215 0.0
0.0 3.6 3.7
sumForAllLights Shade 261 37215 3.6
2.3 3.6 3.7
illumination Shade 262 111645 0.0
1.3 0.0 1.3
shade Datatypes 259 37215 0.0
0.0 0.0 0.0
intersect Datatypes 257 4096000 21.6
0.0 41.3 1.5
triangle.intersect Triangle 258 4096000 19.8
1.5 19.8 1.5
qtrApply VectorMath 248 3 0.0
0.0 0.0 0.0
negateQtr VectorMath 250 3 0.0
0.0 0.0 0.0
qtrComp VectorMath 249 6 0.0
0.0 0.0 0.0
pgm Main 244 1 7.2
14.3 7.2 14.3
}}}
So we see the same number of calls to `intersect` as to the inner function
`triangle.intersect`, but the time is split about 50/50 between them. The
outer `intersect` consists of two cases and a function call; but I suspect
the crucial factor is that the inner call is a call to an unknown
function, so it has to go through the generic call machinery in the RTS.
It might also be calling a PAP (partial application), which is more
costly. If we do a low-level profile with `perf` (see
[wiki:Debugging/LowLevelProfiling/Perf]), we see:
{{{
11.71% htrace ./dist/build/htrace/htrace [.] s33w_info
10.94% htrace ./dist/build/htrace/htrace [.]
stg_upd_frame_info
8.36% htrace ./dist/build/htrace/htrace [.] pushCostCentre
6.06% htrace ./dist/build/htrace/htrace [.] stg_PAP_apply
4.89% htrace ./dist/build/htrace/htrace [.]
0x000000005f99c9
4.10% htrace ./dist/build/htrace/htrace [.] sCb_info
3.82% htrace ./dist/build/htrace/htrace [.] stg_ap_pp_fast
3.76% htrace ./dist/build/htrace/htrace [.]
0x0000000001f8c2
3.36% htrace ./dist/build/htrace/htrace [.] s33x_info
3.12% htrace ./dist/build/htrace/htrace [.] sJT_info
2.00% htrace ./dist/build/htrace/htrace [.]
enterFunCurShorter
1.98% htrace ./dist/build/htrace/htrace [.]
overwritingClosure
}}}
There is a lot of time being spent in `stg_PAP_apply` and
`stg_ap_pp_fast`, these are the RTS functions for applying a partial
application and applying an unknown function to two arguments
respectively.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/2466#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs