Hi, On Tue, Jun 21, 2016 at 02:36:17PM +0000, Stuart Yoder wrote:
-----Original Message----- From: Will Deacon [mailto:will.dea...@arm.com] Sent: Tuesday, June 21, 2016 4:43 AM To: Robin Murphy <robin.mur...@arm.com> Cc: Stuart Yoder <stuart.yo...@nxp.com>; linux-arm-ker...@lists.infradead.org; iommu@lists.linux- foundation.org; Nipun Gupta <nipun.gu...@nxp.com>; Bharat Bhushan <bharat.bhus...@nxp.com>; Brian Starkey <brian.star...@arm.com> Subject: Re: SMMU driver and stall vs terminate mode On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote: > On 20/06/16 16:28, Stuart Yoder wrote: > >Right now the SMMU driver is hardcoded to configure 'stall' mode for > >context faults: > > > > /* SCTLR */ > > reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP; > > > >We are running into an issue with a device where it seems behave sanely > >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be > >unaware that an access violation occurred. > > Does the device keep issuing transactions after the initial faulting one, by > any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit > with backports to some horrible Android kernel), and I think we concluded > that there's an inherent race window between writing RESUME and acking the > interrupt in which MMU-500 can process another faulting transaction and > reassert the IRQ without Linux realising, which then gets lost and things go > out of whack.
The problem in my case ended up being that one of the IRQ lines for the MMU wasn't actually wired up - so the MMU driver never knew there was an IRQ to handle and so never un-stalled the transactions. I think it was the context bank's line, so global faults worked fine but not context faults. Of course, there may also be a race on RESUME.
Do we not detect this with the MULTI bit in the FSR? > >Is there really some assumption that all devices that send transcactions > >through the SMMU _must_ be able to handle stall mode? I am trying to > >find out from our hw designers what is going on at the signal level for > >the device in question, but it seems to me that 'terminate' mode exists > >for a reason and I wonder what your thoughts are about providing a > >configuration option to allow configuration of terminate mode if a particular > >SoC requires it. > > Personally, I'd quite happily leave it turned off (MMU-400/401 don't support > stalling anyway), but I recall Will having a fairly reasonable-sounding > argument in favour, which I now can't remember the details of. Hopefully he > might remind us, unless his conference is too enthralling. Given that we don't do anything particularly useful in the context fault handler, I also wouldn't object to turning this off (and removing the retry/reporting machinery). However, I'd want t better description of *why* it's causing problems first, so that we can justify the decision in case anybody is using this out of tree.
Is map-on-fault a valid enough use-case? Drivers can register their own fault handlers, so even if arm-smmu isn't doing anything interesting, I think the master's driver might.
I am trying to get more details from HW owners of this device as to its behavior in these 2 different SMMU modes.
My understanding is that it should be transparent to the hardware. It just looks like translation is taking a particularly long time (before ultimately faulting). As long as the MMU IRQ handler is running as it should, the transactions will eventually fault as normal. Thanks, Brian
Stuart
_______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu