Thank you very much, The patches has made Parsec execution on M5 simulator much more stable. However there are still some hiden bugs.
I have successfully run 11 Parsec benchamarks (out of 11) with small input. In addition I have run 4 benchmarks at 8 configurations with different L2 latency each. Only one of those configrations has hung. Regards, Aleksei On Sun, Oct 3, 2010 at 4:37 PM, Steve Reinhardt <[email protected]> wrote: > The patches I referred to have since been committed; if you're using > the latest code from the development repository, you've got them. > > Steve > > On Sun, Oct 3, 2010 at 6:04 AM, Lesha Jolondz <[email protected]> > wrote: > > Hi Steve, > > > > I experience the same problem running PARSEC benchmarks at 4 core > > configuration with shared L2 cache. Could you please send me your > patches. > > > > Thanks, > > Aleksei > > > > On Tue, Aug 24, 2010 at 12:27 AM, Steve Reinhardt <[email protected]> > wrote: > >> > >> Hi Stijn, > >> > >> It's true that there are subtle bugs in the coherence protocol that > >> seem to appear only when you use a different DRAM model that creates > >> different timings. I spent a fair amount of time a month or two ago > >> to try and fix things up, and I made some progress, but it's hard to > >> fix one subtle bug without introducing another one, and I got busy > >> with other things before I could wrap it up. I hope to get back to it > >> soon. I could send you my patches if you are interested. > >> > >> This complexity is one of the reasons we're focusing more on Ruby as > >> our long-term memory system model. > >> > >> As far as the particular behavior you are seeing, note that the > >> protocol is configuration-independent, so the UpgradeReq has to be > >> passed through the L2 cache in case there are other L2 or L3 caches > >> that need to be invalidated. Main memory is responsible for > >> responding to the UpgradeReq since that's the point where it globally > >> completes. There's no need for the DRAM controller to access DRAM or > >> impose a corresponding delay though; it could just respond right away. > >> > >> Steve > >> > >> On Mon, Aug 23, 2010 at 7:15 AM, Stijn Eyerman > >> <[email protected]> wrote: > >> > Hi, > >> > > >> > I'm simulating the PARSEC benchmarks using M5 (with the manual and > files > >> > supplied by the UTAustin people). Some benchmarks execute without > errors > >> > in > >> > functional mode, but seem to get stuck in an infinite loop when using > >> > timing > >> > simulation. > >> > > >> > One such is dedup with the test input. The cause is that one cpu (i.c. > >> > cpu2) > >> > stops dispatching and issuing instructions while it is not finished. > >> > After some days of debugging, I found this to be the cause: > >> > - cpu3 executes a conditional store (SC) > >> > - the cache line is present in its private L1 cache, but the status is > >> > shared, so an UpgradeReq-event is scheduled > >> > - the L1 cache of cpu2 finds the same cache line and invalidates it > >> > - the UpgradeReq is also sent to the shared L2 cache (why?), where it > >> > causes > >> > a miss (cache line not present) > >> > - main memory is accessed (why?) and the store cannot continue until > >> > after > >> > the memory latency > >> > - in the meanwhile cpu2 executes a LoadLocked (LL) to the same cache > >> > line > >> > - since the cacheline was invalidated, it accesses the bus > >> > - the L1 cache of cpu3 detects this load request, finds an mshr that > >> > waits > >> > on the return of data for that cache line (the UpgradeReq), attaches > the > >> > request to that mshr's targets, and inhibits the L2 access for that > >> > memory > >> > operation (since it will be served by the coherence protocol) > >> > - when the memory has served the UpgradeReq from cpu3, the SC on cpu3 > >> > can > >> > continue > >> > - the mshr finds another target (the LL from cpu2) but deletes it > >> > because > >> > the cache line is not dirty > >> > --> see Cache::handleSnoop in cache_impl.hh: > >> > bool respond = blk->isDirty() && pkt->needsResponse(); > >> > ... > >> > if (respond){ > >> > ... > >> > } > >> > else if (is_timing && is_deferred) { > >> > delete pkt; > >> > } > >> > - the status of the LL in cpu2 remains issued, but since the request > is > >> > deleted, no answer returns, and the cpu blocks forever > >> > > >> > It is probably worth to note that I use the DRAM module to simulate > >> > physical > >> > memory. I've seen the warning that it is not tested with the current > >> > memory > >> > model, but as far as I can deduce, this is not the cause of the error > >> > (it > >> > just calculates and returns the memory latency). > >> > > >> > Can someone help me with this (complicated) problem? > >> > > >> > Thanks! > >> > > >> > Stijn > >> > > >> > -- > >> > dr. ir. Stijn Eyerman > >> > > >> > Ghent University > >> > Electronics and Information Systems Department > >> > Sint-Pietersnieuwstraat 41 > >> > 9000 Gent > >> > Belgium > >> > > >> > t: +32 9 264 3456 > >> > f: +32 9 264 3594 > >> > e: [email protected] > >> > w: > >> > http://www.elis.UGent.be/~seyerman/<http://www.elis.ugent.be/~seyerman/> > >> > > >> > ---------------------------------------------------------------- > >> > This message was sent using IMP, the Internet Messaging Program. > >> > > >> > _______________________________________________ > >> > m5-users mailing list > >> > [email protected] > >> > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >> > > >> _______________________________________________ > >> m5-users mailing list > >> [email protected] > >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > > > _______________________________________________ > > m5-users mailing list > > [email protected] > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
