Hi,
On 6/17/19 6:04 PM, Jan Kiszka wrote:
> On 17.06.19 16:14, Ralf Ramsauer wrote:
>> Hi Jan,
>>
>> On 6/17/19 12:47 PM, Jan Kiszka wrote:
>>> On 17.06.19 12:18, Ralf Ramsauer wrote:
>>>>
>>>>
>>>> On 6/17/19 12:15 PM, Jan Kiszka wrote:
>>>>> On 17.06.19 12:11, Ralf Ramsauer wrote:
>>>>>> Hi Jan,
>>>>>>
>>>>>> On 6/17/19 9:49 AM, Jan Kiszka wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> by the end of this week, I'd like to tag a new release. If you have
>>>>>>> anything pending that should be included, make sure to post it
>>>>>>> soon. My
>>>>>>> integration queue is empty, so also let me know if I missed
>>>>>>> something.
>>>>>>
>>>>>> Andrej still has two patches in his queue, but they're not
>>>>>> necessarily
>>>>>> required for v0.11.
>>>>>>
>>>>>> There's still the MSR bitmap issue on AMD64. Valentine didn't respond
>>>>>> yet, I'll have a look at that soon, it's an open issue that should be
>>>>>> fixed.
>>>>>
>>>>> Let me look into that.
>>>>
>>>> Ok. (Maybe that could also be the issue why apic-demo shows implausible
>>>> timings on amd64)
>>>>
>>>
>>> Followed up on that thread. Should be quickly resolvable.
>>>
>>>>>
>>>>>>
>>>>>> Other than that, I'm in the meanwhile pretty sure that there's
>>>>>> something
>>>>>> odd with VT-d, but I can't yet tell what it is exactly.
>>>>>
>>>>> Do you need me to write an instrumentation patch?
>>>>
>>>> Maybe. Let me try some other things I wanted to test last week.
>>>> Otherwise I'll return to you.
>>>>
>>>
>>> FWIW, please try this nevertheless:
>>>
>>> diff --git a/hypervisor/arch/x86/vtd.c b/hypervisor/arch/x86/vtd.c
>>> index 1cae0dcb..110184fa 100644
>>> --- a/hypervisor/arch/x86/vtd.c
>>> +++ b/hypervisor/arch/x86/vtd.c
>>> @@ -567,6 +567,7 @@ static void vtd_update_irte(unsigned int index,
>>> union vtd_irte content)
>>> void *reg_base = dmar_reg_base;
>>> unsigned int n;
>>> +printk("%s: index %d, present %d, content %016llx %016llx\n",
>>> __func__, index, content.field.p, content.raw[0], content.raw[1]);
>>> if (content.field.p) {
>>> /*
>>> * Write upper half first to preserve non-presence.
>>> @@ -824,6 +825,7 @@ int iommu_map_interrupt(struct cell *cell, u16
>>> device_id, unsigned int vector,
>>> union vtd_irte irte;
>>> int base_index;
>>> +printk("%s: device %04x, vector %d, irq_msg %016llx\n", __func__,
>>> device_id, vector, *(u64 *)&irq_msg);
>>> base_index = vtd_find_int_remap_region(device_id);
>>> if (base_index < 0)
>>> return base_index;
>>
>> Please find the hypervisor log and the sysconfig attached. Just for
>> completeness, some other useful stuff is attached, including the output
>> of jailhouse config collect.
>>
>> One remark to the Jailhouse output in console.txt:
>>
>> Everything looks unsuspicious until we add PCI device b3:00.0 that needs
>> to reserve 97(!) interrupts.
>
> Well, that's likely what this thing could potentially use, based on its
> MSI-X vector limit (unless we have a bug in reading that from the
> hardware -> config generator).
>
>>
>> This is the point where instrumentation starts to output:
>>
>> iommu_map_interrupt: device 0400, vector 3, irq_msg 0001080000002822
>> vtd_update_irte: index 109, present 0, content 0000000000000100
>> 0000000000000400
Short intermediate analysis:
The first non-present device is non-present as !irq_msg.valid. irq_msg
is passed via arch_pci_update_msix_vector and comes from
x86_pci_translate_msi. Inside x86_pci_translate_msi,
iommu_cell_emulates_ir(device->cell) is true, so we call
iommu_get_remapped_root_int.
Inside iommu_get_remapper_interrupt, irq_msg.valid will be set to zero
in vtd.c:803:
irq_msg.valid =
(root_irte.field.p && root_irte.field.sid == device_id);
This is the cause why in the end the present bit of the irte entry is
zero. And here it's getting hot:
root_irte.field.p: 1
root_irte.field.sid: 401
devid: 400
Hmm. Either devid or sid is wrong. Let's skip a few messages and look at
the other non-present entries. Same code path.
root_irte.field.p: 1
root_irte.field.sid: 400
devid: 401
400 and 401 are two phys of the same network card. Someone twists things.
Still not finished debugging, but this seems to be the root cause.
Ralf
>>
>> The hypervisor stucks for a moment when printing those lines. In sum, it
>> takes a few seconds for Jailhouse to enable.
>>
>> b3:00.0 is a megaraid/megasas standard raid controller, 04:00.0 and
>> 04:00.1 (0x400, 0x401) are the Broadcom network devices that we loose.
>>
>
> Let's pick the first:
>
> VT-d fault event reported by IOMMU 3:
> Source Identifier (bus:dev.func): 04:00.1
> Fault Reason: 0x22 Fault Info: 1f000000000 Type 0
>
> Interestingly, we program not a single present IRTE for that device. So
> the next thing to check is why that is the case, e.g. what happens between
>
> iommu_map_interrupt: device 0401, vector 0, irq_msg 0001004000002822
>
> and
>
> vtd_update_irte: index 123, present 0, content 0000000000000100
> 0000000000000401
>
>>>
>>> Should list the IRTE entries that are written or invalidated. When
>>> matching their number and device ID against the fault later on, we may
>>> see clearer. If not, we may need to go up further in the call chain, to
>>> the callers of iommu_map_interrupt.
>>
>> After Jailhouse is enabled, and after the VT-d faults occur, we receive
>> a couple of those lines
>>
>> vtd_update_irte: index 16, present 1, content 000100100022010d
>> 000000000004f0f8
>> iommu_map_interrupt: device f0f8, vector 16, irq_msg 0001001000006822
>>
>> on every key press of the serial line.
>
> There is probably some mask/unmask going on that we see when it hits the
> virtualized registers.
>
> Jan
>
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/5a83cab4-f342-8e08-7c34-928cfa60272d%40oth-regensburg.de.
For more options, visit https://groups.google.com/d/optout.