Nice work! No need to send the full profile, but what is the net speedup here? It seems like we should have eliminated about 10% of the runtime, but I wanted to verify that.
Also, what workload are you running on top? With all the time spent in PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're running the tester then that's not so bad, but if you're running a regular program that seems high. Thanks, Steve On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish <[email protected]> wrote: > These profile results from testing ALPHA_FS_MESI_CMP_directory with > configs/example/ruby_fs.py. The simulation was allowed to run for > 200,000,000,000 ticks. > > Profile Result with unmodified SLICC > > > % cumulative self self total > time seconds seconds calls s/call s/call name > 12.19 34.51 34.51 551229802 0.00 0.00 > CacheMemory::isTagPresent(Address const&) const > 8.41 58.33 23.82 17760155 0.00 0.00 PerfectSwitch::wakeup() > 4.49 71.03 12.70 235904391 0.00 0.00 Histogram::add(long > long) > 2.54 78.23 7.20 172127510 0.00 0.00 > CacheMemory::lookup(Address const&) > 2.33 84.82 6.59 93838596 0.00 0.00 > MessageBuffer::enqueue(RefCountingPtr<Message>, long long) > 2.10 90.77 5.95 105280086 0.00 0.00 > RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) > 2.06 96.61 5.84 34537891 0.00 0.00 > BaseSimpleCPU::preExecute() > 1.95 102.12 5.51 43900461 0.00 0.00 > RubyPort::M5Port::recvTiming(Packet*) > 1.93 107.58 5.46 580192104 0.00 0.00 Set::Set(Set const&) > 1.92 113.02 5.44 46506080 0.00 0.00 > L1Cache_Controller::wakeup() > > > Result with modified SLICC > > > % cumulative self self total > time seconds seconds calls s/call s/call name > 9.97 24.78 24.78 17760155 0.00 0.00 PerfectSwitch::wakeup() > 5.42 38.27 13.49 101906879 0.00 0.00 > CacheMemory::lookup_ptr(Address const&) > 5.32 51.50 13.23 235904391 0.00 0.00 Histogram::add(long > long) > 2.30 57.21 5.71 580192104 0.00 0.00 Set::Set(Set const&) > 2.29 62.91 5.70 93838596 0.00 0.00 > MessageBuffer::enqueue(RefCountingPtr<Message>, long long) > 2.19 68.36 5.45 46506080 0.00 0.00 > L1Cache_Controller::wakeup() > 2.14 73.67 5.31 34537891 0.00 0.00 > BaseSimpleCPU::preExecute() > 2.10 78.89 5.22 11125106 0.00 0.00 > MemoryControl::executeCycle() > 2.06 84.02 5.13 96775149 0.00 0.00 > RubyEventQueueNode::process() > 1.98 88.94 4.92 105280086 0.00 0.00 > RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) > . > . > . > 1.30 121.31 3.23 51172611 0.00 0.00 > CacheMemory::isTagPresent(Address const&) const > > > I can send the complete data generated by gprof, if required. > > I have inlined my comments. > > > > On Mon, 20 Dec 2010, Beckmann, Brad wrote: > > Hi Nilay, >> >> I apologize for the delay, but I was mostly travelling / in meetings last >> week and I didn't have a chance to review your patches and emails until this >> morning. >> >> Overall, your patches are definitely solid steps in the right direction >> and your profiling data sounds very promising. If you get the chance, >> please send it to me. I would be interested to know what are the top >> performance bottlenecks after your change. >> >> Before you spend time converting the other protocols, I do want to discuss >> the three points you brought up last week (see below). I have a bunch of >> free time over the next three days (Mon. - Wed.) and I do think a telephone >> conversation is best to discuss these details. Let me know what times work >> for you. >> > > The semester is over, so I am available almost throughout the day. Today, I > have a meeting at 3, which I think should be at most an hour long. Over next > two days, I do not have any thing scheduled so far. So any time will work. > > > >> Brad >> >> >> 1. Currently the implicit TBE and Cache Entry pointers are set to NULL in >> the calls to doTransition() function. To set these, we would need to make >> calls to a function that returns the pointer if the address is in the cache, >> NULL otherwise. >> >> I think we should retain the getEntry functions in the .sm files for in >> case of L1 cache both instruction and the data cache needs to be checked. >> This is something that I probably would prefer keeping out of SLICC. In >> fact, we should add getEntry functions for TBEs where ever required. >> >> These getEntry would now return a pointer instead of a reference. We would >> need to add support for return_by_pointer to SLICC. Also, since these >> functions would be used inside the Wakeup function, we would need to assume >> a common name for them across all protocols, just like getState() function. >> >> [BB] I would be very interested why you believe we should keep the >> getEntry functions out of SLICC. In my mind, this is one of the few >> functions that is very consistent across protocols. As I mentioned before, >> I really want to keep any notion of pointers out of the .sm files and avoid >> the changes you are proposing to getCacheEntry. We should probably discuss >> this in detail over-the-phone. >> > > We would need to figure out the cache memories machine has, their > hierarchy, whether there are I and D caches. In fact, MOESI-hammer has L1I > cache, L1D cache and L2 all in the same machine. I think we should not do > this analysis in the compiler. > > > >> 2. I still think we would need to change the changePermission function in >> the CacheMemory class. Presently it calls findTagInSet() twice. Instead, we >> would pass on the CacheEntry whose permissions need to be changed. This >> would save one call. We should also put the variable m_locked in the >> AbstractCacheEntry (may be make it part of the permission variable) to avoid >> the second call. >> >> [BB] I like moving the locked field to AbstractCacheEntry and removing the >> separate m_locked data structure. However, just a minor point, but we >> should avoid duplicating code in CacheMemory to support this change. Other >> than that, this looks good to me. >> > > Could not resist my self from carrying out this change. Now > changePermission() resides in AbstractCacheEntry. To get this working, I had > to change SLICC to support calls to member functions of base class using an > object of derived class. > > > >> 3. In the getState() and setState() functions, we need to specify that the >> function assumes that implicit TBE and CacheEntry pointers have been passed >> as arguments. How should we do this? I think we would need to push them in >> to the symbol table before they can be used in side the function. >> >> [BB] I'm a little confused by your current patch. It appears that you are >> proposing having two pairs of getState and setState functions. I would >> really like to avoid that and just have one pair of getState and setState >> functions. Also when I say implicitly pass the TBE and CacheEntry pointers, >> I mean that for the actions (similar to address). However, I think it is >> fine to explicitly pass these parameters into getState and setState (also >> similar to Address and State). >> > > No, I am not proposing two different versions of these functions. SLICC has > always assumed that getState() and setState() functions exist. Using this, I > make SLICC push cache_entry and tbe in to the Symbol Table when it > encounters declarations of getState() and setState(). As you pointed out, > this is similar to pushing address variable in Action declarations. This > works fine. > > I think the major problem is in letting a function use a pointer, for > example cache_entry_ptr in two different forms - as 'cache_entry_ptr' and as > '*cache_entry_ptr'. The first form is used for passing to other functions, > the second form for editing the underlying object's data members. Right now, > in order to over come this, I push two instances of the variable on to the > symbol table, one that will output the first form, another that will output > the second form. > > > >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
