Hi everybody, During running some benchmarks (actually, a modified version of ocean-contiguous-partitions from Splash-3 "https://github.com/SakalisC/Splash-3"), we encountered a deadlock.
After diving into the trace files, we found that an atomic instruction was locking the memory of the requested block. This lock needs to be released using the 'stul' micro-op, but its memory request was locked at the LSQ-Unit, because a load is waiting for a cache response. This load will never finish because it is referenced to the same cache block as the locked block. X86 atomics are defined surounded by two memory barriers: mfence ldstl ... stul mfence so, a later memory instruction has to wait until the mfence finish. The memory dependence has a special handler for fences. When a fence is added, it stores that a fence is enabled and its sequence number. Then, it will add the last current fence as a memory dependency for all the instructions until it commits: +---+------------+-----+ |seq|Instructions|Fence| +---+------------+-----+ | 0| add | | | 1| mfence | 1 | | 2| ldstl | 1 | | 3| add | 1 | | 4| stul | 1 | | 5| mfence | 5 | | 6| load | 5 | | 7| mfence | 7 | +---+------------+-----+ With this idea, everything should work, but What happens when a later mfence is squashed? Looking at the memory dependence unit, we see that the fence checks if it is the current fence, and if it is, the fence is disabled in the memory dependency unit. Therefore, What happens if a fence is squashed but a previous fence did not commit yet? In the following table, we can see a possible case: +---+------------+-----+---------+ |seq|Instructions|Fence|Committed| +---+------------+-----+---------+ | 0| add | | Yes | | 1| mfence | 1 | Yes | | 2| ldstl | 1 | Yes | | 3| add | 1 | Yes | | 4| stul | 1 | No | | 5| mfence | 5 | No | | 6| load | 5 | No | | 7| beq | 5 | No |---+ | 8| mfence | 8 | No | | | 9| ldstl | 5 | No | | Squashed | 10| sub | 5 | No | | | 11| stul | 5 | No | | | 12| mfence | 12 | No |<--+ | 13| load | | No | +---+------------+-----+---------+ The branch instruction is mispredicted, but new fences were set, therefore, the original fence at seq:5 is no longer active despite the fact it is not committed. Now, the load instruction at seq:13 can be executed, and if it collides with the unfinished 'stul' instruction, it can cause a memory dependency violation and later a deadlock. It should be like this: +---+------------+-----+---------+ |seq|Instructions|Fence|Committed| +---+------------+-----+---------+ | 0| add | | Yes | | 1| mfence | 1 | Yes | | 2| ldstl | 1 | Yes | | 3| add | 1 | Yes | | 4| stul | 1 | No | | 5| mfence | 5 | No |<-----------------+ | 6| load | 5 | No | | | 7| beq | 5 | No |---+ | | 8| mfence | 8 | No | | | | 9| ldstl | 5 | No | | Squashed | Dependency | 10| sub | 5 | No | | | Recovered | 11| stul | 5 | No | | | | 12| mfence | 12 | No |<--+ | | 13| load | 5 | No |------------------+ +---+------------+-----+---------+ To solve this problem, we have multiple ideas: - Store all the active fences in a "stack-like" structure, and when a fence is removed/squashed recover the last active fence. (This is the solution we implemented, find the patch attached) - Give to the branch the information about the last active fence, and when it is squashed, recover it - Add to the new fence a dependency with the current active fence and recover it when squashed We want to know your thoughts about this problem and how to solve it. Environment used: - Arch: X86 - Simulation: Full-System - Number of Cores: 16 - Cache: Ruby (L1 and L2) - Coherence Protocol: MESI_TWO_Levels - Kernel: Linux 4.9.3 - OS: Ubuntu 16.04 - Application: ocean-contiguous (with modifications) Thanks a lot for your attention. Best Regards, Eduardo -- Eduardo José Gómez Hernández [email protected] Faculty of Computer Science University of Murcia (Spain) _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
