Hi,

I'm simulating the PARSEC benchmarks using M5 (with the manual and files supplied by the UTAustin people). Some benchmarks execute without errors in functional mode, but seem to get stuck in an infinite loop when using timing simulation.

One such is dedup with the test input. The cause is that one cpu (i.c. cpu2) stops dispatching and issuing instructions while it is not finished.
After some days of debugging, I found this to be the cause:
- cpu3 executes a conditional store (SC)
- the cache line is present in its private L1 cache, but the status is shared, so an UpgradeReq-event is scheduled
- the L1 cache of cpu2 finds the same cache line and invalidates it
- the UpgradeReq is also sent to the shared L2 cache (why?), where it causes a miss (cache line not present) - main memory is accessed (why?) and the store cannot continue until after the memory latency
- in the meanwhile cpu2 executes a LoadLocked (LL) to the same cache line
- since the cacheline was invalidated, it accesses the bus
- the L1 cache of cpu3 detects this load request, finds an mshr that waits on the return of data for that cache line (the UpgradeReq), attaches the request to that mshr's targets, and inhibits the L2 access for that memory operation (since it will be served by the coherence protocol) - when the memory has served the UpgradeReq from cpu3, the SC on cpu3 can continue - the mshr finds another target (the LL from cpu2) but deletes it because the cache line is not dirty
   --> see Cache::handleSnoop in cache_impl.hh:
                bool respond = blk->isDirty() && pkt->needsResponse();
                ...
                if (respond){
                ...
                }
                else if (is_timing && is_deferred) {
                    delete pkt;
                }
- the status of the LL in cpu2 remains issued, but since the request is deleted, no answer returns, and the cpu blocks forever

It is probably worth to note that I use the DRAM module to simulate physical memory. I've seen the warning that it is not tested with the current memory model, but as far as I can deduce, this is not the cause of the error (it just calculates and returns the memory latency).

Can someone help me with this (complicated) problem?

Thanks!

Stijn

--
dr. ir. Stijn Eyerman

Ghent University
Electronics and Information Systems Department
Sint-Pietersnieuwstraat 41
9000 Gent
Belgium

t: +32 9 264 3456
f: +32 9 264 3594
e: [email protected]
w: http://www.elis.UGent.be/~seyerman/

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to