Hi Gabriel, Thanks for your reply.
That makes sense. This way, we have no consistency between the CPU's view and the HCA's view - it all depends when the cache gets flushed to RAM. However, if the HCA performs reads from L3 cache, then everything should be consistent, right? While ordering the writes, I think we can assume that they are ordered till the cache hierarchy (with no guarantees for when they appear in RAM). Ido Shamai (@Mellanox) told me that RDMA writes go to L3 cache. This, plus on-chip memory controllers make me think that reads should come from L3 cache too. I believe the atomic operations would be a lot more expensive than reads/writes. I'm targetting maximum performance so I don't want to look that way yet. --Anuj On Tue, Nov 12, 2013 at 6:16 AM, Gabriele Svelto <[email protected]> wrote: > Hi Anuj, > > > On 10/11/2013 11:46, Anuj Kalia wrote: >> >> How can this happen in the presence of memory barriers? With barriers, >> A[i].counter should be updated later and therefore should always be >> smaller than A[i].value. > > > memory barriers such as mfence synchronize memory operations from the point > of view of CPUs only. Practically this means that the stores you wrote might > go out to memory in a different order than what the processor sees and > external devices such as a PCIe HCAs might thus see a different ordering > even in the presence of memory barriers. > > To ensure that an external devices sees your store in the order you meant > you will need some form of external barrier though I do not know if it is > possible at all in userspace and besides it will be a fragile solution. > > Instead I would suggest you to use verbs atomic operations such as > IBV_WR_ATOMIC_CMP_AND_SWP and IBV_WR_ATOMIC_FETCH_AND_ADD to implement what > you have in mind. > > Gabriele -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
