I am running m5.prof multiple times to get an idea of average performance.
I will get back to you later today with the numbers.
Thanks
Nilay
On Mon, 20 Dec 2010, Steve Reinhardt wrote:
Nice work! No need to send the full profile, but what is the net speedup
here? It seems like we should have eliminated about 10% of the runtime, but
I wanted to verify that.
Also, what workload are you running on top? With all the time spent in
PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're
running the tester then that's not so bad, but if you're running a regular
program that seems high.
Thanks,
Steve
On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish <[email protected]> wrote:
These profile results from testing ALPHA_FS_MESI_CMP_directory with
configs/example/ruby_fs.py. The simulation was allowed to run for
200,000,000,000 ticks.
Profile Result with unmodified SLICC
% cumulative self self total
time seconds seconds calls s/call s/call name
12.19 34.51 34.51 551229802 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
8.41 58.33 23.82 17760155 0.00 0.00 PerfectSwitch::wakeup()
4.49 71.03 12.70 235904391 0.00 0.00 Histogram::add(long
long)
2.54 78.23 7.20 172127510 0.00 0.00
CacheMemory::lookup(Address const&)
2.33 84.82 6.59 93838596 0.00 0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
2.10 90.77 5.95 105280086 0.00 0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
2.06 96.61 5.84 34537891 0.00 0.00
BaseSimpleCPU::preExecute()
1.95 102.12 5.51 43900461 0.00 0.00
RubyPort::M5Port::recvTiming(Packet*)
1.93 107.58 5.46 580192104 0.00 0.00 Set::Set(Set const&)
1.92 113.02 5.44 46506080 0.00 0.00
L1Cache_Controller::wakeup()
Result with modified SLICC
% cumulative self self total
time seconds seconds calls s/call s/call name
9.97 24.78 24.78 17760155 0.00 0.00 PerfectSwitch::wakeup()
5.42 38.27 13.49 101906879 0.00 0.00
CacheMemory::lookup_ptr(Address const&)
5.32 51.50 13.23 235904391 0.00 0.00 Histogram::add(long
long)
2.30 57.21 5.71 580192104 0.00 0.00 Set::Set(Set const&)
2.29 62.91 5.70 93838596 0.00 0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
2.19 68.36 5.45 46506080 0.00 0.00
L1Cache_Controller::wakeup()
2.14 73.67 5.31 34537891 0.00 0.00
BaseSimpleCPU::preExecute()
2.10 78.89 5.22 11125106 0.00 0.00
MemoryControl::executeCycle()
2.06 84.02 5.13 96775149 0.00 0.00
RubyEventQueueNode::process()
1.98 88.94 4.92 105280086 0.00 0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
.
.
.
1.30 121.31 3.23 51172611 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
I can send the complete data generated by gprof, if required.
I have inlined my comments.
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev