On 8/31/20 8:40 AM, Christoph Hellwig wrote: > On Sun, Aug 30, 2020 at 11:04:21AM +0200, Cédric Le Goater wrote: >> Hello, >> >> On 7/8/20 5:24 PM, Christoph Hellwig wrote: >>> Use the DMA API bypass mechanism for direct window mappings. This uses >>> common code and speed up the direct mapping case by avoiding indirect >>> calls just when not using dma ops at all. It also fixes a problem where >>> the sync_* methods were using the bypass check for DMA allocations, but >>> those are part of the streaming ops. >>> >>> Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which >>> has never been well defined, as is only used by a few drivers, which >>> IIRC never showed up in the typical Cell blade setups that are affected >>> by the ordering workaround. >>> >>> Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") >>> Signed-off-by: Christoph Hellwig <h...@lst.de> >>> --- >>> arch/powerpc/Kconfig | 1 + >>> arch/powerpc/include/asm/device.h | 5 -- >>> arch/powerpc/kernel/dma-iommu.c | 90 ++++--------------------------- >>> 3 files changed, 10 insertions(+), 86 deletions(-) >> >> I am seeing corruptions on a couple of POWER9 systems (boston) when >> stressed with IO. stress-ng gives some results but I have first seen >> it when compiling the kernel in a guest and this is still the best way >> to raise the issue. >> >> These systems have of a SAS Adaptec controller : >> >> 0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G >> SAS/PCIe 3 (rev 01) >> >> When the failure occurs, the POWERPC EEH interrupt fires and dumps >> lowlevel PHB4 registers among which : >> >> [ 2179.251069490,3] PHB#0003[0:3]: phbErrorStatus = >> 0000028000000000 >> [ 2179.251117476,3] PHB#0003[0:3]: phbFirstErrorStatus = >> 0000020000000000 >> >> The bits raised identify a PPC 'TCE' error, which means it is related >> to DMAs. See below for more details. >> >> >> Reverting this patch "fixes" the issue but it is probably else where, >> in some other layers or in the aacraid driver. How should I proceed >> to get more information ? > > The aacraid DMA masks look like a mess. Can you try the hack > below and see it it helps?
No effect. The system crashes the same. But Alexey spotted some issue with swiotlb. C.