Re: [gem5-dev] Callgraph

Nilay Vaish Tue, 29 May 2012 07:58:42 -0700

Did you forgot to attach the picture?

I have known for some time that a major time is spent inRefCountingPtr::del(). It is called a lot many times and each call can endup resulting in a call to delete(). I was thinking of moving to a systemin which memory for instructions is allocated statically when thesimulator starts and is reused, instead of making calls to new() anddelete() all the time. Is this what FastAlloc does?

Here is a profile result that I had obtained from gprof some time lastyear --

4.30 845.85 845.85 176602458326 0.00 0.00RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del()4.26 1685.00 839.15 2820342034 0.00 0.00DefaultFetch<O3CPUImpl>::fetch(bool&)3.35 2343.36 658.36 2820342034 0.00 0.00FullO3CPU<O3CPUImpl>::tick()3.05 2943.80 600.44 2426497872 0.00 0.00DefaultRename<O3CPUImpl>::renameInsts(short)2.51 3437.20 493.40 2820342034 0.00 0.00InstructionQueue<O3CPUImpl>::scheduleReadyInsts()


--
Nilay

On Tue, 29 May 2012, Ali Saidi wrote:



We recently took a look at the callgraph from gem5 with an O3 cpu
and it's pretty startling (see attached picture). The majority of time
is spent in memory management. The biggest chunk of this is in fetch
when instructions are built, however I assumed that FastAlloc would be
used. Nominally it would, except for that with both ARM and x86 the size
of a DynInst is > 512 bytes which is the max size FastAlloc handles.
Alpha seems to sneak under the limit, but either way it is astounding to
me that a single instruction requires over .5kB of storage. Doing some
quick math, if more than 64 dyninsts exist in the system they don't fit
in the L1 cache anymore. One thing we can do is increase the max size of
FastAlloc to 1kB, but it seems like we need to think about how to
slim-down a DynInst. I've looked over it and it seems like we loose
around 48 bytes to alignment issues, as members are scattered throughout
and the are Addrs, bools and then more Addrs. It seems like changing
some of the bools we currently have to setters/getters with an
underlying bitvector might help, and we might want to think about
packing the most used members together as apposed to the somewhat random
approach we have right now. You could nearly half it, if the processor
of interest doesn't have > 256 physical registers.

Furthermore,
looking at the picture there seem to be plenty of other places where
there are a lot of calls to new (teal-ish)/free(orange). It seems like
we could certainly make more use of FastAlloc, assuming it's actually
helping.

Thanks,

Ali

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Callgraph

Reply via email to