Hi Mitch,
In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB would
just fall over. Would it be possible to split this change into a series
of smaller patches?
For example, you could split it into one patch per functional unit and a
final patch that does some cleaning up. You could probably just 'fake'
new un-templated class names as typedefs in the relevant header files.
//Andreas
On 2014-05-13 18:23, Mitch Hayenga via gem5-dev wrote:
Hi All,
Recently I have written a patch that removes templating from the o3 cpu.
In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance. The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory). Many of the lines are simply because the
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when a
modification is made to the o3 cpu. This patch eliminates having to
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently on o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev