On 5/23/2019 8:35 AM, Marek Szyprowski wrote: > Hi Robin, > > On 2019-05-22 15:55, Robin Murphy wrote: >> On 22/05/2019 14:34, Christoph Hellwig wrote: >>> On Wed, May 22, 2019 at 02:25:38PM +0100, Robin Murphy wrote: >>>> Sure, but that should be irrelevant since the effective problem here >>>> is in >>>> the sync_*_for_cpu direction, and it's the unmap which nobbles the >>>> buffer. >>>> If the driver does this: >>>> >>>> dma_map_single(whole buffer); >>>> <device writes to part of buffer> >>>> dma_unmap_single(whole buffer); >>>> <contents of rest of buffer now undefined> >>>> >>>> then it could instead do this and be happy: >>>> >>>> dma_map_single(whole buffer, SKIP_CPU_SYNC); >>>> <device writes to part of buffer> >>>> dma_sync_single_for_cpu(updated part of buffer); >>>> dma_unmap_single(whole buffer, SKIP_CPU_SYNC); >>>> <contents of rest of buffer still valid> >>> >>> Assuming the driver knows how much was actually DMAed this would >>> solve the issue. Horia, does this work for you? In my particular case, input is provided as a scatterlist, out of which first N bytes are problematic (not written to by device and corrupted when swiotlb bouncing is needed), while remaining bytes (Total - N) are updated by the device.
>> >> Ohhh, and now I've just twigged what you were suggesting - your >> DMA_ATTR_PARTIAL flag would mean "treat this as a read-modify-write of >> the buffer because we *don't* know exactly which parts the device may >> write to". So indeed if we did go down that route we wouldn't need any >> of the sync stuff I was worrying about (but I might suggest naming it >> DMA_ATTR_UPDATE instead). Apologies for being slow :) > > Don't we have DMA_BIDIRECTIONAL for such case? Maybe we should update > documentation a bit to point that DMA_FROM_DEVICE expects the whole > buffer to be filled by the device? > Or, put more bluntly, driver must not rely on previous data in the area mapped DMA_FROM_DEVICE. This limitation stems from the buffer bouncing mechanism of the swiotlb DMA API backend, which other backends might not suffer from (e.g. IOMMU). Btw, the device I am working on (caam crypto engine) is deployed in several SoCs configured differently - with or without an IOMMU (and coherent or non-coherent etc.). IOW it's a "power user" of the DMA API and I appreciate all the help in solving / clarifying this kind of implicit assumptions. Thanks, Horia
