[m5-dev] Profile Results for Mesh Network

Nilay Vaish Wed, 19 Jan 2011 09:20:32 -0800

I profiled m5 again, using the following command.

./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py--maxtick 200000000000 -n 8 --topology Mesh --mesh-rows 2 --num-l2cache 8--num-dir 8

Results have been copied below. CacheMemory::lookup() still consumes sometime but is lot less than before. PrefectSwitch can be a candidate forre-design. But given that, PerfectSwitch is taking only 3% of the time, itwould not yield that much gain. May be we need to look at these differentfunctions in a more holistic fashion.


--
Nilay


  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name

7.29 35.89 35.89 606750284 0.00 0.00 Histogram::add(longlong)5.84 64.62 28.73 256533483 0.00 0.00CacheMemory::lookup(Address const&)4.56 87.06 22.44 124360139 0.00 0.00L1Cache_Controller::wakeup()3.88 106.18 19.12 121283110 0.00 0.00RubyPort::M5Port::recvTiming(Packet*)3.13 121.60 15.42 6875704 0.00 0.00PerfectSwitch::wakeup()3.00 136.38 14.78 39527382 0.00 0.00MemoryControl::executeCycle()2.95 150.91 14.53 90855686 0.00 0.00BaseSimpleCPU::preExecute()2.68 164.10 13.19 147750111 0.00 0.00MessageBuffer::enqueue(RefCountingPtr<Message>, long long)2.41 175.96 11.86 180281626 0.00 0.00RubyEventQueueNode::process()2.07 186.17 10.21 302054741 0.00 0.00EventQueue::serviceOne()2.03 196.15 9.98 121176116 0.00 0.00Sequencer::getRequestStatus(RubyRequest const&)



On Tue, 18 Jan 2011, Nilay wrote:

Brad,

I got the simulation working. It seems to me that you wrote Mesh.py under
the assumption that number of cpus = number of L1 controllers = number of
L2 controllers (if present) = number of directory controllers.

The following options worked after some struggle and some help from Arka -

./build/ALPHA_FS_MESI_CMP_directory/m5.fast ./configs/example/ruby_fs.py
--maxtick 2000000000 -n 16 --topology Mesh --mesh-rows 4 --num-dirs 16
--num-l2caches 16

--
Nilay


On Tue, January 18, 2011 10:28 am, Beckmann, Brad wrote:

Hi Nilay,

My plan is to tackle the functional access support as soon as I check in
our current group of outstanding patches.  I'm hoping to at least check in
the majority of them in the next couple of days.  Now that you've
completed the CacheMemory access changes, you may want to re-profile GEM5
and make sure the next performance bottleneck is routing network messages
in the Perfect Switch.  In particular, you'll want to look at rather large
(16+ core) systems using a standard Mesh network.  If you have any
questions on how to do that, Arka may be able to help you out, if not, I
can certainly help you.  Assuming the Perfect Switch shows up as a major
bottleneck (> 10%),  then I would suggest that as the next area you can
work on.  When looking at possible solutions, don't limit yourself to just
changes within Perfect Switch itself.  I suspect that redesigning how
destinations are encoded and/or the interface between MessageBuffer
dequeues and the PerfectSwitch wakeup, will lead to a better solution.

Brad

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

[m5-dev] Profile Results for Mesh Network

Reply via email to