Any updates Mitch?
Thanks, Ali On 11.10.2012 20:44, Mitch Hayenga wrote: > Hi, > > I have a patch that fixes this in classic and ruby. I was waiting for another student (Dibakar, he runs a lot more parallel code than I do) to test it out before submitting to the reviewboard. I'll bug him and see if he's tested it out yet. > > On Thu, Oct 11, 2012 at 7:32 PM, Ali Saidi <[email protected]> wrote: > >> Hi Mitch, >> >> Did you end up getting it working? >> >> Thanks, >> Ali >> >> On Sep 26, 2012, at 3:39 PM, Steve Reinhardt wrote: >> >>> That's a reasonable hardware implementation. Actually you need a register per hardware thread context, not just per core. >>> >>> Our software implementation is intended to model such a hardware implementation, but the actual software is different for a couple of reasons. The main one is that we don't want to do two address-based lookups on every access; CAMs are much cheaper in HW than in SW. Associating the LL state with each cache block means you can check the lock state much more cheaply than iterating over a set of lock registers, particularly in the common case where there are no locks. Also, the cache typically doesn't know how many CPUs or SMT thread contexts it's supporting, so it's tricky to allocate the right number of registers, and the block-based model avoids this problem. >>> >>> I think you're right that the one thing we're not emulating properly is that the recorded lock range should be tight and not be implicitly expanded to cover the whole block as we've done. So you've convinced me that that's not just the most straightforward fix, but probably the right one. >>> >>> If you get it working, please submit the patch. >>> >>> Thanks! >>> >>> Steve >>> >>> On Wed, Sep 26, 2012 at 1:25 PM, Mitch Hayenga <[email protected]> wrote: >>> >>>> Hmm, I had normally thought that LL/SC were handled with special address range registers @ the cache controller. Since a core should really only have one outstanding LL/SC pair, a register per core would suffice and exactly encode the range. Basically doing the same thing that your more-fine grained locks within the cache block would achieve. >>>> >>>> On Wed, Sep 26, 2012 at 3:08 PM, Steve Reinhardt <[email protected]> wrote: >>>> >>>>> This is a pretty interesting issue. I'm not sure how it would be handled in practice. Since the loads and stores in question are not to the same address, in theory at least store set predictor should not be involved. My guess is that the most straightforward fix would be to record the actual range of the LL in the request structure and only clear the lock flag on a store if the store truly overlaps (not just if it's to the same block). >>>>> >>>>> Steve >>>>> >>>>> On Wed, Sep 26, 2012 at 12:50 PM, Mitch Hayenga <[email protected]> wrote: >>>>> >>>>>> Thanks for the reply. >>>>>> >>>>>> Thinking about this... I don't know too much about the O3 store-set predictor, but it would seem that load-linked instructions should care about the entire cache line, not just if the store happens to overlap. Since, it looks like the pending stores write to the address range [0xf9c2c-0xf9c33], but the load-linked is to [0xf9c28-0xf9c2b] (non-overlapping, same cache line). So the load issues early, but the stores come in and clear the lock from the cacheline. So, either non-LLSC stores (from the same core) shouldn't clear the locks to a cacheline (src/cache/blk.hh:279). Or the store-set predictor should hold the linked-load until the stores (to the same cacheline, but not overlapping) have written back. Dibakar, another grad student here, says this impacts Ruby as well. >>>>>> >>>>>> On Wed, Sep 26, 2012 at 1:27 PM, Ali Saidi <[email protected]> wrote: >>>>>> >>>>>>> Hi Mitch, >>>>>>> >>>>>>> I wonder if this happens in the steady state? With the implementation the store-set predictor should predict that the store is going to conflict the load and order them. Perhaps that isn't getting trained correctly with LLSC ops. You really don't want to mark the ops as serializing as that slows down the cpu quite a bit. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Ali >>>>>>> >>>>>>> On 26.09.2012 13 [2]:14, Mitch Hayenga wrote: >>>>>>> >>>>>>>> Background: >>>>>>>> I have a non-o3, out of order CPU implemented on gem5. Since I don't have a checker implemented yet, I tend to diff committed instructions vs o3. Yesterday's patches caused a few of these diffs change because of load-linked/store-conditional behavior (better prediction on data ops that write the PC leads to denser load/store scheduling). >>>>>>>> Issue: >>>>>>>> It seems O3's own loads/stores can cause its load-linked/store-conditional pair to fail. Previously running a single core under SE, every load-linked/store-conditional pair would succeed. Now I'm observing them failing 21% of the time (on single-threaded programs). Although the programs functionally work given how the LL/SC is coded currently, I think this points to the fact LL/SC should be handled slightly differently. >>>>>>>> Example: >>>>>>>> From "Hello World" on ARM+O3+Single Core+SE+Classic Memory that shows this. This contains locks because I assume the C++ library is thread-safe. >>>>>>>> http://pastebin.com/sNjTPBWY [1] >>>>>>>> The O3 CPU is effectively doing a "Test and TestAndSet". It looks like the load for the Test and the load-linked for the race for memory. Also, the CPU has a pending writeback to the same line. So effectively, the TestAndSet fails (haven't dug into it to determine if it was the racing load or the writeback that caused the failure). >>>>>>>> Given this, shouldn't load-linked (in this case ldrex) instructions be marked as non-speculative (or one of the other flags) so that they don't contend with earlier operations? >>>>>>>> Thanks. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gem5-users mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] >>>>>> >>>>>> _______________________________________________ >>>>>> gem5-users mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing list >>>>> [email protected] >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] >>> _______________________________________________ >>> gem5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [3] Links: ------ [1] http://pastebin.com/sNjTPBWY [2] tel:26.09.2012%2013 [3] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
