On Mon, 20 Aug 2018, Sinan Kaya wrote:
> > Likewise see memory-barriers.txt throughout concerning `mmiowb' (which is
> > an obviously lighter weight barrier compared to `readX').
>
> Here is a better reference from memory-barriers.txt
>
> (*) readX(), writeX():
>
> Whether these are guaranteed to be fully ordered and uncombined with
> respect to each other on the issuing CPU depends on the
> characteristics
> defined for the memory window through which they're accessing. On later
> i386 architecture machines, for example, this is controlled by way of the
> MTRR registers.
>
> Ordinarily, these will be guaranteed to be fully ordered and uncombined,
> provided they're not accessing a prefetchable device.
See the next sentence too, and I am concerned about the "characteristics
defined for the memory window" qualification here -- how is the memory
window defined in the general sense? For i386 we have the MTRR registers,
but how about other platforms?
Anyway, if we were to guarantee that `readX' and `writeX' were fully
ordered, then we would have to place barriers in matching places across
accessors, i.e. either before or after the actual MMIO access, but
uniformly across all of them, rather than having them mixed. Placing them
beforehand is normally better as buffers will often have drained already
by that time, meaning the performance cost of the barrier will be lower.
As from commit commit 92d7223a7423 ("alpha: io: reorder barriers to
guarantee writeX() and iowriteX() ordering #2") we have barriers in mixed
positions and placed beforehand and afterwards in write and read accesses
respectively, meaning that if we issue say:
writel(x, foo);
y = readl(bar);
then the read from `bar' can be reordered ahead of the write to `foo',
which is very, very bad, breaking requirements set out across
io_ordering.txt and memory-barriers.txt. I am fairly sure this is the
cause of the regression observed.
You need to make a corresponding update to `readX' and `ioreadX' then
(and once that has been fixed we can consider the general matter of MMIO
barriers independently).
Maciej