Hi Rick,
Thanks for pointing this out! I believe I have found the problem. I have an
updated version that has fixed the bug and I can send it to you if you'd like.
If you'd rather prefer this old version, maybe you can simply try your method
---- to make loadLink exclusive (by adding NeedExclusive property to the
LoadLinked MemCmd). I'll post a link to the updated patch in one day or two.
Here is how this bug is raised:
Note that when a cpu calls StoreCond and its Dcache has the load-linked block
in shared state, the Dcache wants to upgrade their D-cache block from shared to
modified. In the context of MSI coherence, it is treated as a replacement, and
this is done in two phases: a) Evict: evict the current block (each
cache sends InvalidateReq to L2, waiting for InvalidateResp. To avoid naming
confusion, let's call this EvictReq/EvictResp, as opposed to the InvalidateReq
that's sent from L2 to L1 upon remote exclusive accesses) b) Upgrade: after
EvictResp is received, each cache sends a ReadExReq to L2, waiting for
ReadExResp
If another StoreCond is sent later, we expect it either fails upon miss in
D-cache, or hits but the upgrade fails, so that it is nacked and resend again
(which will eventually miss and fail).
However, this two-phase approach does not meet our expectation because the
two-phases do not happen atomically. Another separate problem of this two-phase
approach is that it may double the miss latency. I've addressed these issues in
the updated version.
Here is how the store-conditional cheated:
1. cpu0 and cpu1 both successfully did a load linked (shared_read)2. cpu0 and
cpu1 issues store conditional at roughly the same time. (tick 8070266133500).
Both hit but need to be upgraded. They both start phase (a) to evict the
block.3. because cpu0(a) and cpu0(b) are not atomically combined, cpu0(a) can
then be followed by cpu1(a). now the L2 block changes from Shared to Uncached
because all the upper level D-caches have evicted their copies. 4. after that,
both cpu(0) and cpu(1) sends a ReadEx to L2, the L2 first satisfies cpu0(b),
making the storeCond from cpu0 successful, it then invalidates the copy that
just sent to cpu0 and then satisfies cpu1(b). In the end, both store
conditionals are successful because they both load a new block from the L2 and
the block records no locks.
The current updated version avoids this problem by combining (a) and (b) into
one packet. So in the same scenario, cpu0(b) will directly follow cpu0(a) by a
Shared_ReadEx transaction at L2, which issues an InvalidateReq to the copy at
cpu1. cpu1's EvictReq+ReadExReq will arrive before the response
(InvalidateResp), and therefore it will conflict with the on-going
Shared_ReadEx transaction. As a result, cpu1(a) will be nacked and retried
again and over again, until the storeConditional is issued after cpu1's block
is invalidated, and the store conditional will fail correctly.
Regards,
Jiayuan
> From: [email protected]
> To: [email protected]; [email protected]
> Date: Wed, 25 Feb 2009 01:29:05 -0500
> Subject: Re: [m5-users] Directory coherence, implementing uncacheable
> load-linked/store-conditional
>
>
>
>
> On Feb 24, 2009, at 7:41 PM, Rick Strong wrote:
>
>> I believe that currently, the directory coherence implementation is
>> suffering from an incomplete implementation of uncacheable
>> load-linked/store conditional. It appears below that a lock is being
>> grabbed for address A=0xffff-fc00-1f4a-af28 by two cpus
>> simultaneously
>> at time 8070266129000 in @_read_lock. CPU0 performs stl_c and returns
>> followed by CPU1 succeeding with its stl_c. On further research into
>> the ALPHA-tlb implementation TLB::checkCacheability(RequestPtr &req,
>> bool itb) sets the req to uncacheable if (req->getPaddr() &
>> PAddrUncachedBit43) or the 43 bit is set in the address. This is the
>> case for A=0xffff-fc00-1f4a-af28 unless I am going blind which is
>> possible. So a few things I wanted to confirm with some expert sin the
>> memory system:
>>
>> 1) Is this indeed an uncacheable address?
> It looks like an alpha super page address to me. In which case the
> part of the translate function before the checkCachability() call
> should mask off the high address bits so bit 43 will not be set.
>>
>>
>> 2) Should alpha support stl_c and ldl_c for uncacheable accesses?
> If would have to look at the Alpha 21264 reference manual to be sure,
> but I'm pretty sure that load locked/store conditional only works on
> cacheable addresses.
>
>>
>> 3) How should I efficiently implement stl_c and ldl_c for directory
>> coherence without having to maintain a global structure somewhere?
>> Ultimately, I want the directory coherence to work on a mesh in full
>> system, so I can't just sniff the bus. Any ideas?
>
>>
>>
>>
>> Best,
>> -Rick
>>
>> *The Trace:*
>> 8070266029500: server.detail_cpu0 T0 : @ext2_get_branch+156 :
>> jsr r26,(r27) : IntAlu : D=0xfffffc00003ee62c
>> 8070266045000: server.detail_cpu1 T0 : @ext2_get_branch+152 :
>> ldq r27,-14856(r29) : MemRead : D=0xfffffc00005e86f8
>> A=0xfffffc0000787cf8
>> 8070266046500: server.detail_cpu1 T0 : @ext2_get_branch+156 :
>> jsr r26,(r27) : IntAlu : D=0xfffffc00003ee62c
>> 8070266129000: server.detail_cpu0 T0 : @_read_lock : ldl_l
>> r1,0(r16) : MemRead : D=0x0000000000000000 A=0xfffffc001f4aaf28
>> 8070266130500: server.detail_cpu1 T0 : @_read_lock : ldl_l
>> r1,0(r16) : MemRead : D=0x0000000000000000 A=0xfffffc001f4aaf28
>> 8070266130500: server.detail_cpu0 T0 : @_read_lock+4 : blbs
>> r1,0xfffffc00005e89c4 : IntAlu :
>> 8070266132000: server.detail_cpu0 T0 : @_read_lock+8 : subl
>> r1,2,r1 : IntAlu : D=0xfffffffffffffffe
>> 8070266132000: server.detail_cpu1 T0 : @_read_lock+4 : blbs
>> r1,0xfffffc00005e89c4 : IntAlu :
>> 8070266133500: server.detail_cpu1 T0 : @_read_lock+8 : subl
>> r1,2,r1 : IntAlu : D=0xfffffffffffffffe
>> 8070266166500: server.detail_cpu0 T0 : @_read_lock+12 : stl_c
>> r1,0(r16) : MemWrite : D=0x0000000000000001
>> A=0xfffffc001f4aaf28
>> 8070266168000: server.detail_cpu0 T0 : @_read_lock+16 : beq
>> r1,0xfffffc00005e89c4 : IntAlu :
>> 8070266169500: server.detail_cpu0 T0 : @_read_lock+20 :
>> mb : MemRead :
>> 8070266171000: server.detail_cpu0 T0 : @_read_lock+24 : ret
>> (r26) : IntAlu :
>> 8070266172500: server.detail_cpu0 T0 : @ext2_get_branch+160 :
>> ldah r29,58(r26) : IntAlu : D=0xfffffc000078e62c
>> 8070266174000: server.detail_cpu0 T0 : @ext2_get_branch+164 :
>> lda r29,-12076(r29) : IntAlu : D=0xfffffc000078b700
>> 8070266175500: server.detail_cpu0 T0 : @ext2_get_branch+168 :
>> bis r31,r11,r16 : IntAlu : D=0xfffffc001f7ffa08
>> 8070266177000: server.detail_cpu0 T0 : @ext2_get_branch+172 :
>> bis r31,r12,r17 : IntAlu : D=0xfffffc001f7ffa08
>> 8070266178500: server.detail_cpu0 T0 : @ext2_get_branch+176 :
>> bsr r26,verify_chain : IntAlu : D=0xfffffc00003ee640
>> 8070266180000: server.detail_cpu0 T0 : @verify_chain : br
>> 0xfffffc00003edbdc : IntAlu :
>> 8070266181500: server.detail_cpu0 T0 : @verify_chain+8 : cmpule
>> r16,r17,r1 : IntAlu : D=0x0000000000000001
>> 8070266183000: server.detail_cpu0 T0 : @verify_chain+12 : beq
>> r1,0xfffffc00003edc00 : IntAlu :
>> 8070266183500: server.detail_cpu1 T0 : @_read_lock+12 : stl_c
>> r1,0(r16) : MemWrite : D=0x0000000000000001
>> A=0xfffffc001f4aaf28
>> 8070266185000: server.detail_cpu1 T0 : @_read_lock+16 : beq
>> r1,0xfffffc00005e89c4 : IntAlu :
>> 8070266186000: server.detail_cpu0 T0 : @verify_chain+16 : ldq
>> r1,0(r16) : MemRead : D=0xfffffc001f4aaec8 A=0xfffffc001f7ffa08
>> 8070266186500: server.detail_cpu1 T0 : @_read_lock+20 :
>> mb : MemRead :
>> 8070266188000: server.detail_cpu1 T0 : @_read_lock+24 : ret
>> (r26) : IntAlu :
>> 8070266189000: server.detail_cpu0 T0 : @verify_chain+20 : ldl
>> r2,8(r16) : MemRead : D=0x000000000000200c A=0xfffffc001f7ffa10
>> 8070266189500: server.detail_cpu1 T0 : @ext2_get_branch+160 :
>> ldah r29,58(r26) : IntAlu : D=0xfffffc000078e62c
>> 8070266190500: server.detail_cpu0 T0 : @verify_chain+24 : zapnot
>> r2,15,r2 : IntAlu : D=0x000000000000200c
>> 8070266191000: server.detail_cpu1 T0 : @ext2_get_branch+164 :
>> lda r29,-12076(r29) : IntAlu : D=0xfffffc000078b700
>> 8070266192500: server.detail_cpu1 T0 : @ext2_get_branch+168 :
>> bis r31,r11,r16 : IntAlu : D=0xfffffc001f6c3a08
>> 8070266194000: server.detail_cpu1 T0 : @ext2_get_branch+172 :
>> bis r31,r12,r17 : IntAlu : D=0xfffffc001f6c3a08
>> 8070266195500: server.detail_cpu1 T0 : @ext2_get_branch+176 :
>> bsr r26,verify_chain : IntAlu : D=0xfffffc00003ee640
>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_________________________________________________________________
Windows Live⢠Groups: Create an online spot for your favorite groups to meet.
http://windowslive.com/online/groups?ocid=TXT_TAGLM_WL_groups_032009
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users