Re: [m5-users] Possible bug with LLSC instructions and coherence protocol?

Lesha Jolondz Sun, 17 Oct 2010 02:00:22 -0700

 Thank you very much,

The patches has made Parsec execution on M5 simulator much more stable.
However there are still some hiden bugs.


I have successfully run 11 Parsec benchamarks (out of 11) with small input.
In addition I have run 4 benchmarks at 8 configurations with different L2
latency each. Only one of those configrations has hung.

Regards,
Aleksei



On Sun, Oct 3, 2010 at 4:37 PM, Steve Reinhardt <[email protected]> wrote:

> The patches I referred to have since been committed; if you're using
> the latest code from the development repository, you've got them.
>
> Steve
>
> On Sun, Oct 3, 2010 at 6:04 AM, Lesha Jolondz <[email protected]>
> wrote:
> > Hi Steve,
> >
> > I experience the same problem running PARSEC benchmarks at 4 core
> > configuration with shared L2 cache. Could you please send me your
> patches.
> >
> > Thanks,
> > Aleksei
> >
> > On Tue, Aug 24, 2010 at 12:27 AM, Steve Reinhardt <[email protected]>
> wrote:
> >>
> >> Hi Stijn,
> >>
> >> It's true that there are subtle bugs in the coherence protocol that
> >> seem to appear only when you use a different DRAM model that creates
> >> different timings.  I spent a fair amount of time a month or two ago
> >> to try and fix things up, and I made some progress, but it's hard to
> >> fix one subtle bug without introducing another one, and I got busy
> >> with other things before I could wrap it up.  I hope to get back to it
> >> soon.  I could send you my patches if you are interested.
> >>
> >> This complexity is one of the reasons we're focusing more on Ruby as
> >> our long-term memory system model.
> >>
> >> As far as the particular behavior you are seeing, note that the
> >> protocol is configuration-independent, so the UpgradeReq has to be
> >> passed through the L2 cache in case there are other L2 or L3 caches
> >> that need to be invalidated.  Main memory is responsible for
> >> responding to the UpgradeReq since that's the point where it globally
> >> completes.  There's no need for the DRAM controller to access DRAM or
> >> impose a corresponding delay though; it could just respond right away.
> >>
> >> Steve
> >>
> >> On Mon, Aug 23, 2010 at 7:15 AM, Stijn Eyerman
> >> <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > I'm simulating the PARSEC benchmarks using M5 (with the manual and
> files
> >> > supplied by the UTAustin people). Some benchmarks execute without
> errors
> >> > in
> >> > functional mode, but seem to get stuck in an infinite loop when using
> >> > timing
> >> > simulation.
> >> >
> >> > One such is dedup with the test input. The cause is that one cpu (i.c.
> >> > cpu2)
> >> > stops dispatching and issuing instructions while it is not finished.
> >> > After some days of debugging, I found this to be the cause:
> >> > - cpu3 executes a conditional store (SC)
> >> > - the cache line is present in its private L1 cache, but the status is
> >> > shared, so an UpgradeReq-event is scheduled
> >> > - the L1 cache of cpu2 finds the same cache line and invalidates it
> >> > - the UpgradeReq is also sent to the shared L2 cache (why?), where it
> >> > causes
> >> > a miss (cache line not present)
> >> > - main memory is accessed (why?) and the store cannot continue until
> >> > after
> >> > the memory latency
> >> > - in the meanwhile cpu2 executes a LoadLocked (LL) to the same cache
> >> > line
> >> > - since the cacheline was invalidated, it accesses the bus
> >> > - the L1 cache of cpu3 detects this load request, finds an mshr that
> >> > waits
> >> > on the return of data for that cache line (the UpgradeReq), attaches
> the
> >> > request to that mshr's targets, and inhibits the L2 access for that
> >> > memory
> >> > operation (since it will be served by the coherence protocol)
> >> > - when the memory has served the UpgradeReq from cpu3, the SC on cpu3
> >> > can
> >> > continue
> >> > - the mshr finds another target (the LL from cpu2) but deletes it
> >> > because
> >> > the cache line is not dirty
> >> >   --> see Cache::handleSnoop in cache_impl.hh:
> >> >                bool respond = blk->isDirty() && pkt->needsResponse();
> >> >                ...
> >> >                if (respond){
> >> >                ...
> >> >                }
> >> >                else if (is_timing && is_deferred) {
> >> >                    delete pkt;
> >> >                }
> >> > - the status of the LL in cpu2 remains issued, but since the request
> is
> >> > deleted, no answer returns, and the cpu blocks forever
> >> >
> >> > It is probably worth to note that I use the DRAM module to simulate
> >> > physical
> >> > memory. I've seen the warning that it is not tested with the current
> >> > memory
> >> > model, but as far as I can deduce, this is not the cause of the error
> >> > (it
> >> > just calculates and returns the memory latency).
> >> >
> >> > Can someone help me with this (complicated) problem?
> >> >
> >> > Thanks!
> >> >
> >> > Stijn
> >> >
> >> > --
> >> > dr. ir. Stijn Eyerman
> >> >
> >> > Ghent University
> >> > Electronics and Information Systems Department
> >> > Sint-Pietersnieuwstraat 41
> >> > 9000 Gent
> >> > Belgium
> >> >
> >> > t: +32 9 264 3456
> >> > f: +32 9 264 3594
> >> > e: [email protected]
> >> > w: 
> >> > http://www.elis.UGent.be/~seyerman/<http://www.elis.ugent.be/~seyerman/>
> >> >
> >> > ----------------------------------------------------------------
> >> > This message was sent using IMP, the Internet Messaging Program.
> >> >
> >> > _______________________________________________
> >> > m5-users mailing list
> >> > [email protected]
> >> > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >> >
> >> _______________________________________________
> >> m5-users mailing list
> >> [email protected]
> >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >
> >
> > _______________________________________________
> > m5-users mailing list
> > [email protected]
> > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] Possible bug with LLSC instructions and coherence protocol?

Reply via email to