> This barrier cannot be a simple dma_wmb(), since a dma_wmb() is only
> used to guarantee the ordering, with respect to other writes,
> to cache coherent DMA memory.

Could you explain this a bit more (and perhaps in code comment)?

Ensuring other writes are done before writing the "GO!" bit should be
enough, no?

(If it is not, do we need heavier barriers in other places, too?)

