Re: [gem5-users] ARM/O3: Load-linked, store-conditional behavior

Ali Saidi Thu, 11 Oct 2012 17:32:27 -0700

Hi Mitch,

Did you end up getting it working?


Thanks,
Ali

On Sep 26, 2012, at 3:39 PM, Steve Reinhardt wrote:

> That's a reasonable hardware implementation.  Actually you need a register 
> per hardware thread context, not just per core.
> 
> Our software implementation is intended to model such a hardware 
> implementation, but the actual software is different for a couple of reasons. 
>  The main one is that we don't want to do two address-based lookups on every 
> access; CAMs are much cheaper in HW than in SW.  Associating the LL state 
> with each cache block means you can check the lock state much more cheaply 
> than iterating over a set of lock registers, particularly in the common case 
> where there are no locks.  Also, the cache typically doesn't know how many 
> CPUs or SMT thread contexts it's supporting, so it's tricky to allocate the 
> right number of registers, and the block-based model avoids this problem.
> 
> I think you're right that the one thing we're not emulating properly is that 
> the recorded lock range should be tight and not be implicitly expanded to 
> cover the whole block as we've done.  So you've convinced me that that's not 
> just the most straightforward fix, but probably the right one.
> 
> If you get it working, please submit the patch.
> 
> Thanks!
> 
> Steve
> 
> On Wed, Sep 26, 2012 at 1:25 PM, Mitch Hayenga <[email protected]> 
> wrote:
> Hmm, I had normally thought that LL/SC were handled with special address 
> range registers @ the cache controller.  Since a core should really only have 
> one outstanding LL/SC pair, a register per core would suffice and exactly 
> encode the range.  Basically doing the same thing that your more-fine grained 
> locks within the cache block would achieve.
> 
> 
> On Wed, Sep 26, 2012 at 3:08 PM, Steve Reinhardt <[email protected]> wrote:
> This is a pretty interesting issue.  I'm not sure how it would be handled in 
> practice.  Since the loads and stores in question are not to the same 
> address, in theory at least store set predictor should not be involved.  My 
> guess is that the most straightforward fix would be to record the actual 
> range of the LL in the request structure and only clear the lock flag on a 
> store if the store truly overlaps (not just if it's to the same block).
> 
> Steve
> 
> 
> On Wed, Sep 26, 2012 at 12:50 PM, Mitch Hayenga 
> <[email protected]> wrote:
> Thanks for the reply.  
> 
> Thinking about this... I don't know too much about the O3 store-set 
> predictor, but it would seem that load-linked instructions should care about 
> the entire cache line, not just if the store happens to overlap.  Since, it 
> looks like the pending stores write to the address range [0xf9c2c-0xf9c33], 
> but the load-linked is to [0xf9c28-0xf9c2b] (non-overlapping, same cache 
> line).   So the load issues early, but the stores come in and clear the lock 
> from the cacheline.  So, either non-LLSC stores (from the same core) 
> shouldn't clear the locks to a cacheline (src/cache/blk.hh:279).  Or the 
> store-set predictor should hold the linked-load until the stores (to the same 
> cacheline, but not overlapping) have written back.  Dibakar, another grad 
> student here, says this impacts Ruby as well.
> 
> On Wed, Sep 26, 2012 at 1:27 PM, Ali Saidi <[email protected]> wrote:
> Hi Mitch,
> 
>  
> I wonder if this happens in the steady state? With the implementation the 
> store-set predictor should predict that the store is going to conflict the 
> load and order them. Perhaps that isn't getting trained correctly with LLSC 
> ops. You really don't want to mark the ops as serializing as that slows down 
> the cpu quite a bit.
> 
>  
> Thanks,
> 
> Ali
> 
>  
> On 26.09.2012 13:14, Mitch Hayenga wrote:
> 
>> Background: 
>> 
>> I have a non-o3, out of order CPU implemented on gem5.  Since I don't have a 
>> checker implemented yet, I tend to diff committed instructions vs o3.  
>> Yesterday's patches caused a few of these diffs change because of 
>> load-linked/store-conditional behavior (better prediction on data ops that 
>> write the PC leads to denser load/store scheduling).
>> Issue: 
>> It seems O3's own loads/stores can cause its load-linked/store-conditional 
>> pair to fail.  Previously running a single core under SE, every 
>> load-linked/store-conditional pair would succeed.  Now I'm observing them 
>> failing 21% of the time (on single-threaded programs).  Although the 
>> programs functionally work given how the LL/SC is coded currently, I think 
>> this points to the fact LL/SC should be handled slightly differently.
>> Example: 
>> From "Hello World" on ARM+O3+Single Core+SE+Classic Memory that shows this.  
>>  This contains locks because I assume the C++ library is thread-safe.
>> http://pastebin.com/sNjTPBWY
>> The O3 CPU is effectively doing a "Test and TestAndSet".  It looks like the 
>> load for the Test and the load-linked for the race for memory.  Also, the 
>> CPU has a pending writeback to the same line.  So effectively, the 
>> TestAndSet fails (haven't dug into it to determine if it was the racing load 
>> or the writeback that caused the failure).  
>> Given this, shouldn't load-linked (in this case ldrex) instructions be 
>> marked as non-speculative (or one of the other flags) so that they don't 
>> contend with earlier operations?
>> Thanks.
>  
>  
> 
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> 
> 
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> 
> 
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> 
> 
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> 
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] ARM/O3: Load-linked, store-conditional behavior

Reply via email to