Hi, I have a patch that fixes this in classic and ruby. I was waiting for another student (Dibakar, he runs a lot more parallel code than I do) to test it out before submitting to the reviewboard. I'll bug him and see if he's tested it out yet.
On Thu, Oct 11, 2012 at 7:32 PM, Ali Saidi <[email protected]> wrote: > Hi Mitch, > > Did you end up getting it working? > > Thanks, > Ali > > On Sep 26, 2012, at 3:39 PM, Steve Reinhardt wrote: > > That's a reasonable hardware implementation. Actually you need a register > per hardware thread context, not just per core. > > Our software implementation is intended to model such a hardware > implementation, but the actual software is different for a couple of > reasons. The main one is that we don't want to do two address-based > lookups on every access; CAMs are much cheaper in HW than in SW. > Associating the LL state with each cache block means you can check the > lock state much more cheaply than iterating over a set of lock registers, > particularly in the common case where there are no locks. Also, the cache > typically doesn't know how many CPUs or SMT thread contexts it's > supporting, so it's tricky to allocate the right number of registers, and > the block-based model avoids this problem. > > I think you're right that the one thing we're not emulating properly is > that the recorded lock range should be tight and not be implicitly expanded > to cover the whole block as we've done. So you've convinced me that that's > not just the most straightforward fix, but probably the right one. > > If you get it working, please submit the patch. > > Thanks! > > Steve > > On Wed, Sep 26, 2012 at 1:25 PM, Mitch Hayenga < > [email protected]> wrote: > >> Hmm, I had normally thought that LL/SC were handled with special address >> range registers @ the cache controller. Since a core should really only >> have one outstanding LL/SC pair, a register per core would suffice and >> exactly encode the range. Basically doing the same thing that your >> more-fine grained locks within the cache block would achieve. >> >> >> On Wed, Sep 26, 2012 at 3:08 PM, Steve Reinhardt <[email protected]>wrote: >> >>> This is a pretty interesting issue. I'm not sure how it would be >>> handled in practice. Since the loads and stores in question are not to the >>> same address, in theory at least store set predictor should not be >>> involved. My guess is that the most straightforward fix would be to record >>> the actual range of the LL in the request structure and only clear the lock >>> flag on a store if the store truly overlaps (not just if it's to the same >>> block). >>> >>> Steve >>> >>> >>> On Wed, Sep 26, 2012 at 12:50 PM, Mitch Hayenga < >>> [email protected]> wrote: >>> >>>> Thanks for the reply. >>>> >>>> Thinking about this... I don't know too much about the O3 store-set >>>> predictor, but it would seem that load-linked instructions should care >>>> about the entire cache line, not just if the store happens to overlap. >>>> Since, it looks like the pending stores write to the address range >>>> [0xf9c2c-0xf9c33], but the load-linked is to [0xf9c28-0xf9c2b] >>>> (non-overlapping, same cache line). So the load issues early, but the >>>> stores come in and clear the lock from the cacheline. So, either non-LLSC >>>> stores (from the same core) shouldn't clear the locks to a cacheline >>>> (src/cache/blk.hh:279). Or the store-set predictor should hold the >>>> linked-load until the stores (to the same cacheline, but not overlapping) >>>> have written back. Dibakar, another grad student here, says this impacts >>>> Ruby as well. >>>> >>>> On Wed, Sep 26, 2012 at 1:27 PM, Ali Saidi <[email protected]> wrote: >>>> >>>>> ** >>>>> >>>>> Hi Mitch, >>>>> >>>>> >>>>> I wonder if this happens in the steady state? With the implementation >>>>> the store-set predictor should predict that the store is going to conflict >>>>> the load and order them. Perhaps that isn't getting trained correctly with >>>>> LLSC ops. You really don't want to mark the ops as serializing as that >>>>> slows down the cpu quite a bit. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Ali >>>>> >>>>> >>>>> On 26.09.2012 13:14, Mitch Hayenga wrote: >>>>> >>>>> Background: >>>>> I have a non-o3, out of order CPU implemented on gem5. Since I don't >>>>> have a checker implemented yet, I tend to diff committed instructions vs >>>>> o3. Yesterday's patches caused a few of these diffs change because of >>>>> load-linked/store-conditional behavior (better prediction on data ops that >>>>> write the PC leads to denser load/store scheduling). >>>>> Issue: >>>>> It seems O3's own loads/stores can cause its >>>>> load-linked/store-conditional pair to fail. Previously running a single >>>>> core under SE, every load-linked/store-conditional pair would succeed. >>>>> Now >>>>> I'm observing them failing 21% of the time (on single-threaded programs). >>>>> Although the programs functionally work given how the LL/SC is coded >>>>> currently, I think this points to the fact LL/SC should be handled >>>>> slightly >>>>> differently. >>>>> Example: >>>>> From "Hello World" on ARM+O3+Single Core+SE+Classic Memory that shows >>>>> this. This contains locks because I assume the C++ library is >>>>> thread-safe. >>>>> http://pastebin.com/sNjTPBWY >>>>> The O3 CPU is effectively doing a "Test and TestAndSet". It looks >>>>> like the load for the Test and the load-linked for the race for memory. >>>>> Also, the CPU has a pending writeback to the same line. So effectively, >>>>> the TestAndSet fails (haven't dug into it to determine if it was the >>>>> racing >>>>> load or the writeback that caused the failure). >>>>> Given this, shouldn't load-linked (in this case ldrex) instructions be >>>>> marked as non-speculative (or one of the other flags) so that they don't >>>>> contend with earlier operations? >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing list >>>>> [email protected] >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>> >>> >>> _______________________________________________ >>> gem5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> >> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
