On 28.07.21 14:10, Huang Shihua wrote:
>
>
> On Mon, Jul 26, 2021 at 8:08 PM Jan Kiszka <[email protected]
> <mailto:[email protected]>> wrote:
>
> On 26.07.21 19:14, Huang Shihua wrote:
> >
> >
> > On Wednesday, 21 July 2021 at 17:50:53 UTC+2 [email protected]
> <mailto:[email protected]> wrote:
> >
> > On 13.07.21 18:09, Huang Shihua wrote:
> > > HI,
> > >
> > > Currently, I'm trying to run the ivshmem-demo to establish
> > communication
> > > between Linux root cell and one non-root cell. Configuration
> files
> > are
> > > attached.
> > >
> > > Two cases were tested:
> > >
> > > 1. Let the non-root cell load the ivshmem-demo and then
> target at
> > > itself (target=1). _All interrupts can be sent and received
> > correctly_.
> > > 2. Let the root cell and the non-root cell send interrupts
> to each
> > > other. I.e., root cell runs /./tools/demos/ivshmem-demo -t
> 1, /while
> > > the non-root cell load /inmates/demos/x86/ivshmem-demo.bin -s
> > > "target=0" -a 0x1000 /and then run. The result turned out to
> be,
> > > * the non-root cell got the interrupts from the root cell,
> > > * _while the root cell did not receive any interrupt._
> > >
> > > As Jan mentioned
> > >
> >
> in https://groups.google.com/g/jailhouse-dev/c/GRCWFzNaHX8/m/ht8z51BOCgAJ
> <https://groups.google.com/g/jailhouse-dev/c/GRCWFzNaHX8/m/ht8z51BOCgAJ>
> >
> <https://groups.google.com/g/jailhouse-dev/c/GRCWFzNaHX8/m/ht8z51BOCgAJ
> <https://groups.google.com/g/jailhouse-dev/c/GRCWFzNaHX8/m/ht8z51BOCgAJ>>,
> >
> > > tuning the iommu index should do the trick.
> > > However, unfortunately, it did not work for me :c
> > >
> > > There are 8 iommu units on the hardware, I tuned the iommu index
> > in the
> >
> > Wow, 8 units...
> >
> > > root cell configuration from 0 to 7. The same behavior, no
> interrupts
> > > were received by the root cell, remains when tuning the
> index from
> > 0 to
> > > 6. When the iommu is set to 7, the kernel crashed
> immediately when
> > the
> > > demo was started on the non-root cell.
> > >
> > > Any idea regarding why the root cell always failed to receive
> > interrupts?
> >
> > This may require in-detail debugging. For that, you would have to
> > instrument the hypervisor along its virtual IRQ injection
> path. That
> > starts in ivshmem_trigger_interrupt() (hypervisor/ivshmem.c). The
> > sending side will call it on writing the doorbell registers. Check
> > along
> > this call path if conditions to actually send the IRQ are not met.
> >
> > If all are met, the hypervisor sends an IPI to a target cell
> CPU (will
> > be directly delivered to the guest) that should cause the
> normal IRQ
> > processing there. But usually, we do not get so far in such cases.
> >
> > Another function of interest here is
> arch_ivshmem_update_msix() when
> > called for the root cell while it defines where ivshmem IRQs
> should go
> > to. Possibly, Jailhouse decides that the programming Linux
> issued is
> > not
> > valid and therefore leaves the irq_cache that
> > arch_ivshmem_trigger_interrupt() uses invalid. You can also
> check that
> > via instrumentations (printk).
> >
> >
> > Indeed, when .iommu is assigned as 0,1,..6, irq_cache is invalid.
> I suspect
> > the reason is that their correpsonding VT-d interrupt remappting table
> > entries
> > are not for ivshmem devices, i.e., unmatched device ID.
> > When .iommu is tuned to 7, irq_cache becomes valid.
> >
>
> OK, then we know what needs to be set. I will have to check eventually
> if we can read out that information also from sysfs so that this
> guessing can end.
>
> > (BTW, as I mentioned before, the kernel crashed immediately when the
> > demo was started on the non-root cell. _One missing detail here
> is_, on the
> > root-cell side, ./tools/demos/ivshmem-demo is running/has run, i.e.,
> > init_control has been set to 1. If ./tools/demos/ivshmem-demo has
> not been
> > run on the root cell yet, then starting the demo on the non-root cell
> > will not
> > kill the kernel.)
>
> Now we need to understand the crash. The root cell kernel oopses, right?
> Any logs from that?
>
>
> Activating hypervisor
> CAT: Using COS 0 with bitmask 000007ff for cell ivshmem-demo
> Adding virtual PCI device 00:0e.0 to cell "ivshmem-demo"
> Shared memory connection established, peer cells:
> "RootCell"
> Created cell "ivshmem-demo"
> Page pool usage after cell creation: mem 938/3534, remap 65603/131072
> Cell "ivshmem-demo" can be loaded
> CPU 1 received SIPI, vector 100
> Started cell "ivshmem-demo"
> IVSHMEM: Found device at 00:0e.0
> IVSHMEM: bar0 is at 0x00000000ff000000
> IVSHMEM: bar1 is at 0x00000000ff001000
> IVSHMEM: ID is 1
> IVSHMEM: max. peers is 3
> IVSHMEM: state table is at 0x000000003f0f0000
> IVSHMEM: R/W section is at 0x000000003f0f1000
> IVSHMEM: input sections start at 0x000000003f0fa000
> IVSHMEM: output section is at 0x000000003f0fc000
> IVSHMEM: initialized device
> state[0] = 0
> state[1] = 2
> state[2] = 0
> rw[0] = -1347440721
> rw[1] = 0
> rw[2] = -1347440721
> in@0x0000 = -1347440721
> in@0x2000 = 0
> in@0x4000 = -1347440721
>
> IVSHMEM: sending IRQ 2 to peer 2
>
> IVSHMEM: sending IRQ 2 to peer 2
> <---------- ./tools/demos/ivshmem-demo -t 1 (root cell)
> IVSHMEM: got interrupt 0 (#1)
> state[0] = 0
> state[1] = 2
> state[2] = 3
> rw[0] = -1347440721
> rw[1] = 0
> rw[2] = 0
> in@0x0000 = -1347440721
> in@0x2000 = 0
> in@0x4000 = 0
>
> IVSHMEM: sending IRQ 2 to peer 2
> FATAL: Unhandled VM-Exit, reason 26
Root cell is issuing a VMXOFF instruction - could come from
cpu_emergency_vmxoff().
> qualification 0
> vectoring info: 0 interrupt info: 0
> RIP: 0xffffffff8d05f6ae RSP: 0xffffafa9c0003fc0 FLAGS: 2
That RIP is likely pointing to that function in the kernel. But we
rather need a backtrace. Please try CONFIG_CRASH_CELL_ON_PANIC (see
Documentation/hypervisor-configuration.md).
> RAX: 0x00000000007626f0 RBX: 0x0000000000000000 RCX: 0x000000007ffefbff
> RDX: 0x00000000bfebfbff RSI: 0xffffafa9c0003fc8 RDI: 0xffffafa9c0003fc4
> CS: 10 BASE: 0x0000000000000000 AR-BYTES: a09b EFER.LMA 1
> CR0: 0x0000000080050033 CR3: 0x0000001fbd80a004 CR4: 0x00000000007626f0
> EFER: 0x0000000000000d01
> Parking CPU 0 (Cell: "RootCell")
>
> IVSHMEM: sending IRQ 2 to peer 2
> Ignoring NMI IPI to CPU 0
> Ignoring NMI IPI to CPU 2
> Ignoring NMI IPI to CPU 3
> Ignoring NMI IPI to CPU 5
> Ignoring NMI IPI to CPU 6
> Ignoring NMI IPI to CPU 7
> Ignoring NMI IPI to CPU 8
> Ignoring NMI IPI to CPU 9
> Ignoring NMI IPI to CPU 10
> Ignoring NMI IPI to CPU 11
> Ignoring NMI IPI to CPU 12
> Ignoring NMI IPI to CPU 13
> Ignoring NMI IPI to CPU 14
> Ignoring NMI IPI to CPU 15
>
> IVSHMEM: sending IRQ 2 to peer 2
>
>
>
> And what do yo mean with init_control?
>
>
> oops, typo, should be int_control...
> the int_control of struct ivshm_regs in ivshmem-demo/c
> struct ivshm_regs {
> uint32_t id;
> uint32_t max_peer;
> uint32_t int_control;
> .....
> }
> _so when root cell mimo_write 1 to regs->int_control while non-root cell
> has been running, then the kernel crashes._
>
That write opens the gate of ivshmem interrupts for the root cell.
>
> >
> > To avoid the kernel crashing situation, I only ran the demo on the
> > non-root cell. With .iommu being set validly, I will expect at least
> > seeing the
> > interrupt count increases, when grep ivshmem /proc/interrupts.
> > But nope, _still no interrupts received on the root cell_.
> >
>
> If there is no driver registered on the root side or not opened (by the
> demo app), then the interrupt reception is disabled. We need to debug
> the "hot" case.
>
>
> Right, after diving into the source code, I did see that as when
> ive->int_ctrl_reg=0,
> no interrupt will be triggered, i.e., arch_ivshmem_trigger_interrupt is
> skipped.
>
> I have a question regarding the code below.
> static void ivshmem_trigger_interrupt(struct ivshmem_endpoint *ive,
> unsigned int vector)
> {
>
> /*
> * Hold the IRQ lock while sending the interrupt so that ivshmem_exit
> * and ivshmem_register_mmio can synchronize on the completion of the
> * delivery.
> */
> spin_lock(&ive->irq_lock);
>
>
> if (ive->int_ctrl_reg & IVSHMEM_INT_ENABLE) {
>
> if (ive->cspace[IVSHMEM_CFG_VNDR_CAP/4] &
>
> IVSHMEM_CFG_ONESHOT_INT)
>
> ive->int_ctrl_reg = 0;
>
>
> arch_ivshmem_trigger_interrupt(ive, vector);
>
> }
>
>
> spin_unlock(&ive->irq_lock);
>
> }
>
> Q1: IVSHMEM_CFG_ONESHOT_INT means?
> Q2: What does meeting this condition mean,
> ive->cspace[IVSHMEM_CFG_VNDR_CAP/4] & IVSHMEM_CFG_ONESHOT_INT?
> Q3: Why trigger_interrupt when ive->int_ctrl_reg = 0?
See Documentation/ivshmem-v2-specification.md, "one-shot interrupt mode".
> Q4: I tried to add "else" a line above arch_ivshmem_trigger_interrupt,
> i.e., arch_ivshmem_trigger_interrupt is skipped when
>
"./tools/demos/ivshmem-demo -t 1 was executed on the root cell, thus no
kernel crash,
non-root can later receive interrupt #!0 from the root cell, and :) yeah
the root cell still receives nothing."
This is not changing the fundamental issue that already a single
interrupt causes problems for the root cell.
What you could do in addition to obtaining the backtrace from Linux is
getting the details of the IPI sent to the root-cell CPU. Which APIC
parameters are use there? Or just print the content of
ive->irq_cache.msg[vector] in arch_ivshmem_trigger_interrupt.
Background: If something went wrong, we may not deliver a normal
interrupt but rather some INIT/SIPI or whatever event.
Jan
--
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/33d2aa37-d2b1-24da-5e1e-ed1eddd80239%40siemens.com.