Hi Mitch, Did you end up getting it working?
Thanks, Ali On Sep 26, 2012, at 3:39 PM, Steve Reinhardt wrote: > That's a reasonable hardware implementation. Actually you need a register > per hardware thread context, not just per core. > > Our software implementation is intended to model such a hardware > implementation, but the actual software is different for a couple of reasons. > The main one is that we don't want to do two address-based lookups on every > access; CAMs are much cheaper in HW than in SW. Associating the LL state > with each cache block means you can check the lock state much more cheaply > than iterating over a set of lock registers, particularly in the common case > where there are no locks. Also, the cache typically doesn't know how many > CPUs or SMT thread contexts it's supporting, so it's tricky to allocate the > right number of registers, and the block-based model avoids this problem. > > I think you're right that the one thing we're not emulating properly is that > the recorded lock range should be tight and not be implicitly expanded to > cover the whole block as we've done. So you've convinced me that that's not > just the most straightforward fix, but probably the right one. > > If you get it working, please submit the patch. > > Thanks! > > Steve > > On Wed, Sep 26, 2012 at 1:25 PM, Mitch Hayenga <[email protected]> > wrote: > Hmm, I had normally thought that LL/SC were handled with special address > range registers @ the cache controller. Since a core should really only have > one outstanding LL/SC pair, a register per core would suffice and exactly > encode the range. Basically doing the same thing that your more-fine grained > locks within the cache block would achieve. > > > On Wed, Sep 26, 2012 at 3:08 PM, Steve Reinhardt <[email protected]> wrote: > This is a pretty interesting issue. I'm not sure how it would be handled in > practice. Since the loads and stores in question are not to the same > address, in theory at least store set predictor should not be involved. My > guess is that the most straightforward fix would be to record the actual > range of the LL in the request structure and only clear the lock flag on a > store if the store truly overlaps (not just if it's to the same block). > > Steve > > > On Wed, Sep 26, 2012 at 12:50 PM, Mitch Hayenga > <[email protected]> wrote: > Thanks for the reply. > > Thinking about this... I don't know too much about the O3 store-set > predictor, but it would seem that load-linked instructions should care about > the entire cache line, not just if the store happens to overlap. Since, it > looks like the pending stores write to the address range [0xf9c2c-0xf9c33], > but the load-linked is to [0xf9c28-0xf9c2b] (non-overlapping, same cache > line). So the load issues early, but the stores come in and clear the lock > from the cacheline. So, either non-LLSC stores (from the same core) > shouldn't clear the locks to a cacheline (src/cache/blk.hh:279). Or the > store-set predictor should hold the linked-load until the stores (to the same > cacheline, but not overlapping) have written back. Dibakar, another grad > student here, says this impacts Ruby as well. > > On Wed, Sep 26, 2012 at 1:27 PM, Ali Saidi <[email protected]> wrote: > Hi Mitch, > > > I wonder if this happens in the steady state? With the implementation the > store-set predictor should predict that the store is going to conflict the > load and order them. Perhaps that isn't getting trained correctly with LLSC > ops. You really don't want to mark the ops as serializing as that slows down > the cpu quite a bit. > > > Thanks, > > Ali > > > On 26.09.2012 13:14, Mitch Hayenga wrote: > >> Background: >> >> I have a non-o3, out of order CPU implemented on gem5. Since I don't have a >> checker implemented yet, I tend to diff committed instructions vs o3. >> Yesterday's patches caused a few of these diffs change because of >> load-linked/store-conditional behavior (better prediction on data ops that >> write the PC leads to denser load/store scheduling). >> Issue: >> It seems O3's own loads/stores can cause its load-linked/store-conditional >> pair to fail. Previously running a single core under SE, every >> load-linked/store-conditional pair would succeed. Now I'm observing them >> failing 21% of the time (on single-threaded programs). Although the >> programs functionally work given how the LL/SC is coded currently, I think >> this points to the fact LL/SC should be handled slightly differently. >> Example: >> From "Hello World" on ARM+O3+Single Core+SE+Classic Memory that shows this. >> This contains locks because I assume the C++ library is thread-safe. >> http://pastebin.com/sNjTPBWY >> The O3 CPU is effectively doing a "Test and TestAndSet". It looks like the >> load for the Test and the load-linked for the race for memory. Also, the >> CPU has a pending writeback to the same line. So effectively, the >> TestAndSet fails (haven't dug into it to determine if it was the racing load >> or the writeback that caused the failure). >> Given this, shouldn't load-linked (in this case ldrex) instructions be >> marked as non-speculative (or one of the other flags) so that they don't >> contend with earlier operations? >> Thanks. > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
