I am running m5.prof multiple times to get an idea of average performance. I will get back to you later today with the numbers.

Thanks
Nilay

On Mon, 20 Dec 2010, Steve Reinhardt wrote:

Nice work!  No need to send the full profile, but what is the net speedup
here?  It seems like we should have eliminated about 10% of the runtime, but
I wanted to verify that.

Also, what workload are you running on top?  With all the time spent in
PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're
running the tester then that's not so bad, but if you're running a regular
program that seems high.

Thanks,

Steve

On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish <[email protected]> wrote:

These profile results from testing ALPHA_FS_MESI_CMP_directory with
configs/example/ruby_fs.py. The simulation was allowed to run for
200,000,000,000 ticks.

Profile Result with unmodified SLICC


 %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 12.19     34.51    34.51 551229802     0.00     0.00
CacheMemory::isTagPresent(Address const&) const
 8.41     58.33    23.82 17760155     0.00     0.00 PerfectSwitch::wakeup()
 4.49     71.03    12.70 235904391     0.00     0.00  Histogram::add(long
long)
 2.54     78.23     7.20 172127510     0.00     0.00
CacheMemory::lookup(Address const&)
 2.33     84.82     6.59 93838596     0.00     0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
 2.10     90.77     5.95 105280086     0.00     0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
 2.06     96.61     5.84 34537891     0.00     0.00
BaseSimpleCPU::preExecute()
 1.95    102.12     5.51 43900461     0.00     0.00
RubyPort::M5Port::recvTiming(Packet*)
 1.93    107.58     5.46 580192104     0.00     0.00  Set::Set(Set const&)
 1.92    113.02     5.44 46506080     0.00     0.00
L1Cache_Controller::wakeup()


Result with modified SLICC


 %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 9.97     24.78    24.78 17760155     0.00     0.00 PerfectSwitch::wakeup()
 5.42     38.27    13.49 101906879     0.00     0.00
CacheMemory::lookup_ptr(Address const&)
 5.32     51.50    13.23 235904391     0.00     0.00  Histogram::add(long
long)
 2.30     57.21     5.71 580192104     0.00     0.00  Set::Set(Set const&)
 2.29     62.91     5.70 93838596     0.00     0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
 2.19     68.36     5.45 46506080     0.00     0.00
L1Cache_Controller::wakeup()
 2.14     73.67     5.31 34537891     0.00     0.00
BaseSimpleCPU::preExecute()
 2.10     78.89     5.22 11125106     0.00     0.00
MemoryControl::executeCycle()
 2.06     84.02     5.13 96775149     0.00     0.00
RubyEventQueueNode::process()
 1.98     88.94     4.92 105280086     0.00     0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
.
.
.
 1.30    121.31     3.23 51172611     0.00     0.00
CacheMemory::isTagPresent(Address const&) const


I can send the complete data generated by gprof, if required.

I have inlined my comments.



_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to