I profiled m5 again, using the following command.
./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
--maxtick 200000000000 -n 8 --topology Mesh --mesh-rows 2 --num-l2cache 8
--num-dir 8
Results have been copied below. CacheMemory::lookup() still consumes some
time but is lot less than before. PrefectSwitch can be a candidate for
re-design. But given that, PerfectSwitch is taking only 3% of the time, it
would not yield that much gain. May be we need to look at these different
functions in a more holistic fashion.
--
Nilay
% cumulative self self total
time seconds seconds calls s/call s/call name
7.29 35.89 35.89 606750284 0.00 0.00 Histogram::add(long
long)
5.84 64.62 28.73 256533483 0.00 0.00
CacheMemory::lookup(Address const&)
4.56 87.06 22.44 124360139 0.00 0.00
L1Cache_Controller::wakeup()
3.88 106.18 19.12 121283110 0.00 0.00
RubyPort::M5Port::recvTiming(Packet*)
3.13 121.60 15.42 6875704 0.00 0.00
PerfectSwitch::wakeup()
3.00 136.38 14.78 39527382 0.00 0.00
MemoryControl::executeCycle()
2.95 150.91 14.53 90855686 0.00 0.00
BaseSimpleCPU::preExecute()
2.68 164.10 13.19 147750111 0.00 0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
2.41 175.96 11.86 180281626 0.00 0.00
RubyEventQueueNode::process()
2.07 186.17 10.21 302054741 0.00 0.00
EventQueue::serviceOne()
2.03 196.15 9.98 121176116 0.00 0.00
Sequencer::getRequestStatus(RubyRequest const&)
On Tue, 18 Jan 2011, Nilay wrote:
Brad,
I got the simulation working. It seems to me that you wrote Mesh.py under
the assumption that number of cpus = number of L1 controllers = number of
L2 controllers (if present) = number of directory controllers.
The following options worked after some struggle and some help from Arka -
./build/ALPHA_FS_MESI_CMP_directory/m5.fast ./configs/example/ruby_fs.py
--maxtick 2000000000 -n 16 --topology Mesh --mesh-rows 4 --num-dirs 16
--num-l2caches 16
--
Nilay
On Tue, January 18, 2011 10:28 am, Beckmann, Brad wrote:
Hi Nilay,
My plan is to tackle the functional access support as soon as I check in
our current group of outstanding patches. I'm hoping to at least check in
the majority of them in the next couple of days. Now that you've
completed the CacheMemory access changes, you may want to re-profile GEM5
and make sure the next performance bottleneck is routing network messages
in the Perfect Switch. In particular, you'll want to look at rather large
(16+ core) systems using a standard Mesh network. If you have any
questions on how to do that, Arka may be able to help you out, if not, I
can certainly help you. Assuming the Perfect Switch shows up as a major
bottleneck (> 10%), then I would suggest that as the next area you can
work on. When looking at possible solutions, don't limit yourself to just
changes within Perfect Switch itself. I suspect that redesigning how
destinations are encoded and/or the interface between MessageBuffer
dequeues and the PerfectSwitch wakeup, will lead to a better solution.
Brad
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev