On 2017-10-28 03:07, Pontes, Otavio wrote:
> On Tue, 2017-10-10 at 07:26 +0200, Jan Kiszka wrote:
>> On 2017-10-10 02:27, Gustavo Lima Chaves wrote:
>>> * Jan Kiszka <[email protected]> [2017-09-14 08:29:30 +0200]:
>>>
>>> [...]
>>>
>>>>>> continue to hit the root cell, even if they originate from a
>>>>>> functions
>>>>>> that no longer belongs to it.
>>>>>>
>>>>>> On the other hand, denying write access alone does not solve
>>>>>> the issue
>>>>>> either unless we enforce disabling of AER on device hand-
>>>>>> over. Where you
>>>>>> seeing intercepted write accesses during Jailhouse operation?
>>>>>
>>>>> Yes, we were, when splitting some network adaptors (PCI
>>>>> functions).
>>>>> Otavio says the errors come even if the drivers for those
>>>>> adaptors are
>>>>> not present on both root/inmate Linux instances, what is funny.
>>>>> The
>>>>> errors seem to be non-recoverable and non-fatal. Problem is the
>>>>> partitioning can't even be made without support for the
>>>>> capability.
>>>>>
>>>>> I'm new to this (PCI-handling) field in Jailhouse, so I might
>>>>> need
>>>>> some light. I took a quick look at
>>>>> https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf
>>>>>  to see
>>>>> at least some implementation direction on AERs. Now, how TODO
>>>>> entry
>>>>> (in Jailhouse) "PCI AER", under "Hardware error handling"
>>>>> should be
>>>>> attacked?
>>>>
>>>> An important but yet unaddressed topic for Jailhouse is error
>>>> discovery,
>>>> reporting or even recovery, specifically when going safe. AER is
>>>> apparently the feature that allows you diagnose PCI problems. If
>>>> this
>>>> feature can be cleanly partitioned at function level, and no
>>>> hypervisor
>>>> interaction is required during runtime, we can probably leave the
>>>> task
>>>> up to the cell itself. If not, the hypervisor should eventually
>>>> take
>>>> care of catching and forwarding those reports.
>>>
>>> So more on this...
>>>
>>> If the root cell is configured with CONFIG_PCIEAER=y, its .remove()
>>> function will be issued at the time Jailhouse claims root complex
>>> devices, which leads to aer_disable_rootport(). That will walk all
>>> child functions, writing to their PCIe capability (Device Control
>>> Register), disabling error reporting for all of them.
>>>
>>> This is the first issue, once the pristine access flag for this
>>> capability on the config creating tool is RO.
>>>
>>> It then proceeds to write on the root complex' AER capability
>>> registers Root Error Command/Status. Again, this capability is
>>> currently RO from Jailhouse.
>>>
>>> Making this latter capability RW at the config creation tool would
>>> be
>>> harmless IMO, since everything there seems to be specific to the
>>> function and, moreover, Linux inmates won't be able to access PCIe
>>> Extended Capabilities anyway.
>>>
>>> Is my last statement correct? Because from what I see,
>>> jailhouse_setup_data does not pass anything about mmcfg_start/size,
>>> now, so Linux inmates are left to PIO-only access to PCI
>>> configuration
>>> space, right? PCI manual says:
>>
>> The fact that we do not yet provide our inmates a clue where the
>> MMCFG
>> space is located doesn't change the fact that it is already
>> accessible
>> to them. I think we can and should fix that former aspect soon by
>> communicating the location via the comm region of the cell and
>> setup_data of the boot loader and then extending the pv patches for
>> Linux to call pci_mmconfig_insert on x86.
> 
> I have some patches using the comm region to export the the pci
> mmconfig base address. I'll send them following this email. But this
> won't help much on handling AERs on inmates because the inmate Linux
> would still need access to the PCI root port to get AER irqs. 
> 
> Any opinions on how to attack Hardware Error Handling in Jailhouse?
> Should we try to capture AER irqs and find a way to forward them?

If the AER IRQs can't be partitioned along device ownership, we will
have to catch and reinject them, I suppose. How complex could this be?
Would we have to emulate a root port for non-root cells then?

In any case, exploring the thing along this line seems reasonable. Is
there a way to emulate AER events? Hmm, QEMU seems to have some logic
for that (monitor command "pcie_aer_inject_error").

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to