Cédric Le Goater <c...@kaod.org> writes: > On 16/07/2019 11:10, Alexey Kardashevskiy wrote: >> On 16/07/2019 18:59, Cédric Le Goater wrote: >>> On 15/07/2019 09:11, Alexey Kardashevskiy wrote: >>>> There is a race between releasing an irq on one cpu and fetching it >>>> from XIVE on another cpu as there does not seem to be any locking between >>>> these, probably because xive_irq_chip::irq_shutdown() is supposed to >>>> remove the irq from all queues in the system which it does not do. >>>> >>>> As a result, when such released irq appears in a queue, we take it >>>> from the queue but we do not change the current priority on that cpu and >>>> since there is no handler for the irq, EOI is never called and the cpu >>>> current priority remains elevated (7 vs. 0xff==unmasked). If another irq >>>> is assigned to the same cpu, then that device stops working until irq >>>> is moved to another cpu or the device is reset. >>>> >>>> This adds a new ppc_md.orphan_irq callback which is called if no irq >>>> descriptor is found. The XIVE implementation drops the current priority >>>> to 0xff which effectively unmasks interrupts in a current CPU. >>> >>> >>> The test on generic_handle_irq() catches interrupt events that >>> were served on a target CPU while the source interrupt was being >>> shutdown on another CPU. >>> >>> The orphan_irq() handler restores the CPPR in such cases. >>> >>> This looks OK to me. I would have added some more comments in the >>> code. >> >> Which and where? Thanks, > > Above xive_orphan_irq() explaining the complete problem that we are > addressing. XIVE is not super obvious when looking at the code ...
Yes adding a comment would be good, thanks. This will also need a Fixes: tag. cheers