On Thu, 24 Sep 2015, Qais Yousef wrote:
> On 09/23/2015 05:54 PM, Jiang Liu wrote:
> >     Thanks for doing this, but the change is a little bigger than
> > my expectation. Could we achieve this by:
> > 1) extend irq_chip to support send_ipi operation
> > 2) reuse existing irqdomain allocation interfaces to allocate IPI IRQ
> > 3) arch code to create an IPI domain for IPI allocations
> > 4) IRQ core provides some helpers to help arch code to implement IPI
> >     irqdomain

That's not sufficient as IPIs are different from normal interrupts
because we need an interface to actually send them.

> Can you be more specific about 2 please? I tried to reuse the hierarchy
> irqdomain alloc function. One major difference when allocating IPI than a
> normal irq is that it's dynamic. The caller doesn't know what hwirq number it
> needs. It actually shouldn't.

Right. But we have the same behaviour with e.g. MSI. The caller does
not know a hardware irq number because it is dynamically assigned.
 
> The idea is for the user to just say 'I want an IPI to a CPUAFFINITY' from DT
> and get a virq in return to send an IPI to the target CPU(s). Also I think we
> need to accommodate the possibility of having more than 1 IPI controller.

Having more than one IPI controller is not a problem. It's going to be
a separate IPI domain, which you select from DT or other means.

These IPI domains are implemented like the MSI domain as child
domains of the underlying irq domain.

     [IPI domain] ---> [GIC domain]

like we have on x86

     [MSI domain] ---> [Vector domain]

So you need some infrastructure, which allows you to:

 - allocate IPI(s)

     Requests IPI(s) from a IPI domain. That might be the default IPI
     domain or one that is matched via OF against a list of registered
     domains or one which is known to the caller by other means.

     Now that allocation interface does:

      1) Allocate irq descriptor

         This is required even for IPIs which are targeted to
         coprocessors and cannot be requested from Linux. In that case
         the only purpose is to store the irq chip and the irq domain
         specific data for that virq/hwirq mapping and the irq is
         marked as NOREQUEST.

      2) Allocate the vector/hwirq number block from the IPI domain
         
         Part of the allocation request info is a pointer to the
         target cpu mask. The weight of the target cpu mask is the
         number of hwirqs you need to allocate from the underlying
         domain.

         For a normal Linux IPI, this will be the number of possible
         CPUs. For a coprocessor IPI, this will be a single hwirq.

         We also store that target cpu mask for runtime validation and
         other usage in the irq descriptor data. We can actually reuse
         the existing affinity mask for that.

         Now how these hwirqs are allocated is a domain/architecture
         specific issue.

         x86 will just find a vector which is available on all target
         cpus and mark it as used. That's a single hw irq number.

         mips and others, which implement IPIs as regular hw interrupt
         numbers, will allocate a these (consecutive) hw interrupt
         numbers either from a reserved region or just from the
         regular space. That's a bunch of hw irq numbers and we need
         to come up with a proper storage format in the irqdata for
         that. That might be

               struct ipi_mapping {
                      unsigned int      nr_hwirqs;
                      unsigned int      cpumap[NR_CPUS];
               };

         or some other appropriate storage format like:

               struct ipi_mapping {
                      unsigned int      hwirq_base;
                      unsigned int      cpu_offset;
                      unsigned int      nr_hwirqs;
               };

         which is less space consuming, but restricted to consecutive
         hwirqs which can be mapped to the cpu number linearly:

                hwirq = hwirq_base + cpu - cpu_offset;
         
       The result of this is a single virq number, which has all the
       necessary information stored in the associated irq descriptor
       and the domain specific hierarchical irq_data.

       For normal Linux IPIs that irq is marked as per cpu irq and can
       be requested via request_percpu_irq() and enabled/disabled via
       enable_percpu_irq/disable_percpu_irq on CPU hot[un]plug.

 - A function to send an IPI to a virq number

     That function takes the virq number and a target cpumask as
     argument.

     Actually we want two functions where the one which takes an virq
     number is a wrapper around the other which takes a irq descriptor
     pointer.

     The one which takes the virq number can be exported to drivers,
     the other one is a core/arch code only interface. The reason for
     this is that we want to avoid the irq descriptor lookup for
     regular IPIs, but for drivers this is a NONO.

     int irq_send_ipi(int virq, const struct cpumask *mask)
     {
        struct irq_desc *desc = irq_to_desc(virq);

        if (!desc)
                return -EINVAL;

        return irq_desc_send_ipi(desc, mask);
     }

     Along with a version which sends an IPI to all cpus in the target
     mask:

     int irq_send_ipi_all(int virq)
     {
        struct irq_desc *desc = irq_to_desc(virq);
        struct irq_data *data;

        if (!desc)
                return -EINVAL;

        data = irq_desc_get_irq_data(desc);
        return irq_desc_send_ipi(desc, irq_data_get_affinity_mask(data));
     }
     
     And the internal function:

     int irq_desc_send_ipi(struct irq_desc *desc, const struct cpumask *mask)
     {
        struct irq_data *data = irq_desc_get_irq_data(desc);
        struct irq_chip *chip = irq_data_get_irq_chip(data);

        if (!chip || !chip->send_ipi)
                  return -EINVAL;

        /*
         * Do not validate the mask for IPIs marked global. These are
         * regular IPIs so we can avoid the operation as their target
         * mask is the cpu_possible_mask.
         */
        if (!irqd_is_global_ipi(data)) {
           if (!cpumask_subset(mask, irq_data_get_affinity_mask(data))
                  return -EINVAL;
        }

        chip->send_ipi(data, mask);
        return 0;
     }

     So now the chip specific send_ipi function will deal with the
     underlying implementation details.

     on x86 it uses the selected APIC implementation and sends
     the IPI to the vector stored in the hw irq number to all CPUs
     which are in the mask.

     on mips and others it's a bit different as you need to figure out
     the effective hwirq number for the cpus set in the target mask
     from the stored mapping in the hierarchical irq data. We
     certainly can create common helpers for this. Assume the simple
     mapping format:

               struct ipi_mapping {
                      unsigned int      nr_hwirqs;
                      unsigned int      cpumap[];
               };
     
     then a helper function for the IPI domain irq chip would be:

     void irq_chip_send_ipi(struct irq_data *data, const struct cpumask *mask)
     {
        struct ipi_mapping *map = irq_data_get_irq_chip_data(data);
        struct irq_data *parent = data->parent;
        unsigned int cpu, hwirq;

        for_each_cpu(cpu, mask) {
                hwirq = map->cpumap[cpu];
                /* Deal with gaps */
                if (hwirq == INVALID_HWIRQ)
                   continue;
                parent->chip->send_ipi(parent, cpumask_of(cpu));
        }
     }
     
No linked lists, no magic other stuff. Just a natural extension to the
existing hierarchical irq domain code, which can be reused by all
architectures.

Thanks,

        tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to