Hi All,

Recently I have written a patch that removes templating from the o3 cpu.
 In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance.  The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the  compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.

*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory).  Many of the lines are simply because the
"impl" header files were changed to source files.

Here are a few benefits of the patch

   - Cleaner, less verbose code.
   - Due to the current templating/DynInst interaction, gem5 often requires
   rebuilding the function execution signatures (o3_cpu_exec.o) when a
   modification is made to the o3 cpu.  This patch eliminates having to
   rebuild the execution signatures on o3 changes.
   - Marginally better compile/run times.
   - Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
    No other cpu does/should inherit from it anyway.
   - Made the checker directly templated on the execution context (DynInst)
   instead of an "Impl" like o3.  Seems like it was coded dependently on o3.


Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.

*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%


*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
    real    21m32.240s
    user    20m20.019s
    sys     1m6.721s

    real    21m29.963s
    user    20m17.016s
    sys     1m7.108s

*Untempated:*
    real    21m24.396s
    user    20m13.158s
    sys     1m5.798s

    real    21m23.177s
    user    20m11.911s
    sys     1m5.843s


*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
    real    11m35.848s
    user    67m20.828s
    sys     2m2.292s

*Untemplated:*
    real    11m42.167s
    user    67m7.572s
    sys     2m2.056s


*CLANG Run Time (Spec2k twolf)*
*Templated*
    Run 1) 1187.63
    Run 2) 1167.50
    Run 3) 1172.06

*Untemplated*
    Run 1) 1142.29
    Run 2) 1154.49
    Run 3) 1165.53


*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
    Run 1) 12m20.528s
*Untemplated*
   Run 1) 12m19.700s



Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to