Hi Hector,

Sorry it took me so long to get to this. Some comments below.

On Fri, Mar 05, 2021 at 06:38:51AM +0900, Hector Martin wrote:
> This is the root interrupt controller used on Apple ARM SoCs such as the
> M1. This irqchip driver performs multiple functions:
> 
> * Handles both IRQs and FIQs
> 
> * Drives the AIC peripheral itself (which handles IRQs)
> 
> * Dispatches FIQs to downstream hard-wired clients (currently the ARM
>   timer).
> 
> * Implements a virtual IPI multiplexer to funnel multiple Linux IPIs
>   into a single hardware IPI
> 
> Signed-off-by: Hector Martin <[email protected]>
> ---
>  MAINTAINERS                     |   2 +
>  drivers/irqchip/Kconfig         |   8 +
>  drivers/irqchip/Makefile        |   1 +
>  drivers/irqchip/irq-apple-aic.c | 710 ++++++++++++++++++++++++++++++++
>  include/linux/cpuhotplug.h      |   1 +
>  5 files changed, 722 insertions(+)
>  create mode 100644 drivers/irqchip/irq-apple-aic.c

[...]

> + * Implementation notes:
> + *
> + * - This driver creates two IRQ domains, one for HW IRQs and internal FIQs,
> + *   and one for IPIs.
> + * - Since Linux needs more than 2 IPIs, we implement a software IRQ 
> controller
> + *   and funnel all IPIs into one per-CPU IPI (the second "self" IPI is 
> unused).
> + * - FIQ hwirq numbers are assigned after true hwirqs, and are per-cpu.
> + * - DT bindings use 3-cell form (like GIC):
> + *   - <0 nr flags> - hwirq #nr
> + *   - <1 nr flags> - FIQ #nr
> + *     - nr=0  Physical HV timer
> + *     - nr=1  Virtual HV timer
> + *     - nr=2  Physical guest timer
> + *     - nr=3  Virtual guest timer
> + *
> + */
> +
> +#define pr_fmt(fmt) "%s: " fmt, __func__

General nit: but I suspect many of the prints in here probably want to be
using the *_ratelimited variants.

> +static void __exception_irq_entry aic_handle_irq(struct pt_regs *regs)
> +{
> +     struct aic_irq_chip *ic = aic_irqc;
> +     u32 event, type, irq;
> +
> +     do {
> +             /*
> +              * We cannot use a relaxed read here, as DMA needs to be
> +              * ordered with respect to the IRQ firing.
> +              */

I think this could be a bit clearer: the readl() doesn't order any DMA
accesses, but instead means that subsequent reads by the CPU are ordered
(which may be from a buffer which was DMA'd to) are ordered after the
read of the MMIO register.

> +             event = readl(ic->base + AIC_EVENT);
> +             type = FIELD_GET(AIC_EVENT_TYPE, event);
> +             irq = FIELD_GET(AIC_EVENT_NUM, event);
> +
> +             if (type == AIC_EVENT_TYPE_HW)
> +                     handle_domain_irq(aic_irqc->hw_domain, irq, regs);
> +             else if (type == AIC_EVENT_TYPE_IPI && irq == 1)
> +                     aic_handle_ipi(regs);
> +             else if (event != 0)
> +                     pr_err("Unknown IRQ event %d, %d\n", type, irq);
> +     } while (event);
> +
> +     /*
> +      * vGIC maintenance interrupts end up here too, so we need to check
> +      * for them separately. Just report and disable vGIC for now, until
> +      * we implement this properly.
> +      */
> +     if ((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) &&
> +             read_sysreg_s(SYS_ICH_MISR_EL2) != 0) {
> +             pr_err("vGIC IRQ fired, disabling.\n");
> +             sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0);
> +     }

What prevents all these system register accesses being speculated up before
the handler?

> +}
> +
> +static int aic_irq_set_affinity(struct irq_data *d,
> +                             const struct cpumask *mask_val, bool force)
> +{
> +     irq_hw_number_t hwirq = irqd_to_hwirq(d);
> +     struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
> +     int cpu;
> +
> +     if (hwirq > ic->nr_hw)
> +             return -EINVAL;
> +
> +     if (force)
> +             cpu = cpumask_first(mask_val);
> +     else
> +             cpu = cpumask_any_and(mask_val, cpu_online_mask);
> +
> +     aic_ic_write(ic, AIC_TARGET_CPU + hwirq * 4, BIT(cpu));
> +     irq_data_update_effective_affinity(d, cpumask_of(cpu));
> +
> +     return IRQ_SET_MASK_OK;
> +}
> +
> +static int aic_irq_set_type(struct irq_data *d, unsigned int type)
> +{
> +     return (type == IRQ_TYPE_LEVEL_HIGH) ? 0 : -EINVAL;
> +}
> +
> +static struct irq_chip aic_chip = {
> +     .name = "AIC",
> +     .irq_mask = aic_irq_mask,
> +     .irq_unmask = aic_irq_unmask,

I know these are driven by the higher-level irq chip code, but I'm a bit
confused as to what provides ordering if, e.g. something ends up calling:

        aic_chip.irq_mask(d);
        ...
        aic_chip.irq_unmask(d);

I can't see any ISBs in here and they're writing to two different registers,
so can we end up with the IRQ masked after this sequence?

> +/*
> + * IPI irqchip
> + */
> +
> +static void aic_ipi_mask(struct irq_data *d)
> +{
> +     u32 irq_bit = BIT(irqd_to_hwirq(d));
> +     int this_cpu = smp_processor_id();
> +
> +     /* No specific ordering requirements needed here. */
> +     atomic_andnot(irq_bit, &aic_vipi_enable[this_cpu]);
> +}

Why not use a per-cpu variable here instead of an array of atomics? The pcpu
API has things for atomic updates (e.g. or, and, xchg).

> +static void aic_ipi_unmask(struct irq_data *d)
> +{
> +     struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
> +     u32 irq_bit = BIT(irqd_to_hwirq(d));
> +     int this_cpu = smp_processor_id();
> +
> +     /*
> +      * This must complete before the atomic_read_acquire() below to avoid
> +      * racing aic_ipi_send_mask(). Use a dummy fetch op with release
> +      * semantics for this. This is arch-specific: ARMv8 B2.3.3 specifies
> +      * that writes with Release semantics are Barrier-ordered-before reads
> +      * with Acquire semantics, even though the Linux arch-independent
> +      * definition of these atomic ops does not.
> +      */

I think a more idiomatic (and portable) way to do this would be to use
the relaxed accessors, but with smp_mb__after_atomic() between them. Do you
have a good reason for _not_ doing it like that?

> +     (void)atomic_fetch_or_release(irq_bit, &aic_vipi_enable[this_cpu]);
> +
> +     /*
> +      * If a pending vIPI was unmasked, raise a HW IPI to ourselves.
> +      * No barriers needed here since this is a self-IPI.
> +      */
> +     if (atomic_read_acquire(&aic_vipi_flag[this_cpu]) & irq_bit)

"No barriers needed here" right before an acquire is confusing ;)

> +             aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(this_cpu));
> +}
> +
> +static void aic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
> +{
> +     struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
> +     u32 irq_bit = BIT(irqd_to_hwirq(d));
> +     u32 send = 0;
> +     int cpu;
> +     unsigned long pending;
> +
> +     for_each_cpu(cpu, mask) {
> +             /*
> +              * This sequence is the mirror of the one in aic_ipi_unmask();
> +              * see the comment there. Additionally, release semantics
> +              * ensure that the vIPI flag set is ordered after any shared
> +              * memory accesses that precede it. This therefore also pairs
> +              * with the atomic_fetch_andnot in aic_handle_ipi().
> +              */
> +             pending = atomic_fetch_or_release(irq_bit, &aic_vipi_flag[cpu]);

(same here)

> +             if (!(pending & irq_bit) && 
> (atomic_read_acquire(&aic_vipi_enable[cpu]) & irq_bit))
> +                     send |= AIC_IPI_SEND_CPU(cpu);
> +     }
> +
> +     /*
> +      * The flag writes must complete before the physical IPI is issued
> +      * to another CPU. This is implied by the control dependency on
> +      * the result of atomic_read_acquire() above, which is itself
> +      * already ordered after the vIPI flag write.
> +      */
> +     if (send)
> +             aic_ic_write(ic, AIC_IPI_SEND, send);
> +}
> +
> +static struct irq_chip ipi_chip = {
> +     .name = "AIC-IPI",
> +     .irq_mask = aic_ipi_mask,
> +     .irq_unmask = aic_ipi_unmask,
> +     .ipi_send_mask = aic_ipi_send_mask,
> +};
> +
> +/*
> + * IPI IRQ domain
> + */
> +
> +static void aic_handle_ipi(struct pt_regs *regs)
> +{
> +     int this_cpu = smp_processor_id();
> +     int i;
> +     unsigned long enabled, firing;
> +
> +     /*
> +      * Ack the IPI. We need to order this after the AIC event read, but
> +      * that is enforced by normal MMIO ordering guarantees.
> +      */
> +     aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_OTHER);
> +
> +     /*
> +      * The mask read does not need to be ordered. Only we can change
> +      * our own mask anyway, so no races are possible here, as long as
> +      * we are properly in the interrupt handler (which is covered by
> +      * the barrier that is part of the top-level AIC handler's readl()).
> +      */
> +     enabled = atomic_read(&aic_vipi_enable[this_cpu]);
> +
> +     /*
> +      * Clear the IPIs we are about to handle. This pairs with the
> +      * atomic_fetch_or_release() in aic_ipi_send_mask(), and needs to be
> +      * ordered after the aic_ic_write() above (to avoid dropping vIPIs) and
> +      * before IPI handling code (to avoid races handling vIPIs before they
> +      * are signaled). The former is taken care of by the release semantics
> +      * of the write portion, while the latter is taken care of by the
> +      * acquire semantics of the read portion.
> +      */
> +     firing = atomic_fetch_andnot(enabled, &aic_vipi_flag[this_cpu]) & 
> enabled;

Does this also need to be ordered after the Ack? For example, if we have
something like:

CPU 0                                           CPU 1
                                                <some other IPI>
aic_ipi_send_mask()
                                                atomic_fetch_andnot(flag)
        atomic_fetch_or_release(flag)
        aic_ic_write(AIC_IPI_SEND)
                                                aic_ic_write(AIC_IPI_ACK)

sorry if it's a stupid question, I'm just not sure about the cases in which
the hardware will pend things for you.

Will

Reply via email to