On Saturday, August 25, 2018 at 9:11:26 AM UTC-7, Martin Thompson wrote: > > To perform an update via the XADD instruction the cache line containing > the word must first be acquired. x86 uses the MESI cache coherence protocol > and to get the cacheline for update an RFO (Read/Request For Ownership) > message must be sent as a bus transaction. These requests are queued per > core and in the uncore and effectively avoid starvation. Events are > available, e.g. OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO. From the > perspective of instructions each thread takes a finite number of steps to > complete. The CAS equivalent would be lock-free, rather than wait-free, as > the number of steps per thread is not finite. >
The CAS might never "succeed" in a capped number of steps (i.e. someone else could forever beat it to the memory location and the compare could fail in each and every attempt), but the CAS instruction will complete, one way or another, on all practical hardware implementations, in a capped number of steps. A fundamental design requirement in all hardware implementations I'm aware of is to prevent the starvation of any possible (at least user mode, and probably all) instructions, including CAS. This might be achieved in many different ways, often leveraging qualities of the cache protocols and the knowledge that the number of processors and other components (like memory controllers, coherence protocol coordination points, queue depths in various places, etc.) in the system is capped. "How exactly" doesn't matter. It's the job of whomever designs the hardware to make sure it is so. Think about it this way: In any system where it is impossible for you to cause a hardware watchdog reset from user code, there's a "capped number of steps" for executing any individual user-mode instruction, which by definition makes them all (individually) wait-free. There are no non-wait-free user mode instructions in such systems. QED. > > On Saturday, 25 August 2018 06:00:22 UTC+1, Peter Veentjer wrote: >> >> I'm polishing up my lock free knowledge and one question keeps on bugging >> me. >> >> The question is about the LOCK XADD and why it is considered to be wait >> free. >> >> AFAIK for wait freedom there needs to be some fairness. >> >> So image a concurrent counter using a spin on a cas to increment a >> counter, then at least 1 thread wins and makes progress. Therefor this >> implementation is lock free. It isn't wait free because some thread could >> be spinning without bound. The problem here is that there is no fairness. >> >> Now imagine this counter would be implemented using a LOCK XADD and >> therefor there is no need for a loop. What is guaranteeing that every >> thread is going to make progress in a bound number of steps? E.g. could it >> happen that one thread is continuously denied exclusive access to the >> memory and therefor it won't complete in a bound number of steps? Or do the >> request get stored in some kind of list and the requests are processed in >> this order? This order would provide fairness and then it will be wait free. >> >> I have been looking at the Intel documentation of LOCK XADD; but it >> remains unclear. >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
