On Wed, 22 Aug 2018, Arnd Bergmann wrote:
> On Wed, Aug 22, 2018 at 5:50 PM Mikulas Patocka <[email protected]> wrote:
> > On Wed, 22 Aug 2018, Maciej W. Rozycki wrote:
> > > On Wed, 22 Aug 2018, Sinan Kaya wrote:
> >
> > According to the Alpha handbook, non-overlapping accesses may be
> > reordered.
> >
> > So if someone does
> > writel(REG1);
> > readl(REG2);
> >
> > readl may (according to the spec) reach the device before writel. Although
> > actual experiments suggests that the read flushes the queued writes.
> >
> > I would be quite interested why did Linux developers decide that readl
> > should be implemented as "read+barrier" and writel should be implemented
> > as "barrier+write". Why is there this assymetry in the barriers?
>
> I can explain this part: those two barriers are used specifically do order
> an MMIO access against a DMA access: a writel() may be used to start
> a DMA operation copying data from RAM to the device, so we must
> have a barrier between the store to that data and the store to the register
> to ensure the data is visible to the device.
> Similarly, a readl() may check the status of a register that tells us when
> a DMA from device to RAM has completed. We must have a read
> barrier between that mmio load and the load from RAM to prevent
> the data to be prefetched while the MMIO is still in progress.
Then - the question is - why not just use barriers before and after
accesses to DMA'd memory? For DMA into non-coheren memory, the barrier
could be injected into dma_map_* and dma_unmap_* functions (with no change
in drivers) - and for DMA into coherent memory you could have something
like dma_coherent_barrier().
Why does Linux add the barriers between every read and write to memory
mapped registers?
> > Does ARM have some hardware magic that prevents reordering the write and
> > the read in this case?
>
> Most architecture have this AFAICT, ARM and x86 definitely do, and
> PCI requires this to be true on the bus:
>
> All MMIO accesses from a given CPU to a given device (according
> to an architecture-specific definition of "device") are ordered with respect
> to one another.
If ARM guarantees that the accesses to a given device are not reordered -
then the barriers in readl and writel are superfluous.
> If the hardware does not guarantee that, for simple load/store operations
> on uncached device memory, then we need a full barrier after each store
> in addition to the write barrier needed for the DMA synchronization.
>
> Arnd
Mikulas