On 2017-10-28 03:07, Pontes, Otavio wrote: > On Tue, 2017-10-10 at 07:26 +0200, Jan Kiszka wrote: >> On 2017-10-10 02:27, Gustavo Lima Chaves wrote: >>> * Jan Kiszka <[email protected]> [2017-09-14 08:29:30 +0200]: >>> >>> [...] >>> >>>>>> continue to hit the root cell, even if they originate from a >>>>>> functions >>>>>> that no longer belongs to it. >>>>>> >>>>>> On the other hand, denying write access alone does not solve >>>>>> the issue >>>>>> either unless we enforce disabling of AER on device hand- >>>>>> over. Where you >>>>>> seeing intercepted write accesses during Jailhouse operation? >>>>> >>>>> Yes, we were, when splitting some network adaptors (PCI >>>>> functions). >>>>> Otavio says the errors come even if the drivers for those >>>>> adaptors are >>>>> not present on both root/inmate Linux instances, what is funny. >>>>> The >>>>> errors seem to be non-recoverable and non-fatal. Problem is the >>>>> partitioning can't even be made without support for the >>>>> capability. >>>>> >>>>> I'm new to this (PCI-handling) field in Jailhouse, so I might >>>>> need >>>>> some light. I took a quick look at >>>>> https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf >>>>> to see >>>>> at least some implementation direction on AERs. Now, how TODO >>>>> entry >>>>> (in Jailhouse) "PCI AER", under "Hardware error handling" >>>>> should be >>>>> attacked? >>>> >>>> An important but yet unaddressed topic for Jailhouse is error >>>> discovery, >>>> reporting or even recovery, specifically when going safe. AER is >>>> apparently the feature that allows you diagnose PCI problems. If >>>> this >>>> feature can be cleanly partitioned at function level, and no >>>> hypervisor >>>> interaction is required during runtime, we can probably leave the >>>> task >>>> up to the cell itself. If not, the hypervisor should eventually >>>> take >>>> care of catching and forwarding those reports. >>> >>> So more on this... >>> >>> If the root cell is configured with CONFIG_PCIEAER=y, its .remove() >>> function will be issued at the time Jailhouse claims root complex >>> devices, which leads to aer_disable_rootport(). That will walk all >>> child functions, writing to their PCIe capability (Device Control >>> Register), disabling error reporting for all of them. >>> >>> This is the first issue, once the pristine access flag for this >>> capability on the config creating tool is RO. >>> >>> It then proceeds to write on the root complex' AER capability >>> registers Root Error Command/Status. Again, this capability is >>> currently RO from Jailhouse. >>> >>> Making this latter capability RW at the config creation tool would >>> be >>> harmless IMO, since everything there seems to be specific to the >>> function and, moreover, Linux inmates won't be able to access PCIe >>> Extended Capabilities anyway. >>> >>> Is my last statement correct? Because from what I see, >>> jailhouse_setup_data does not pass anything about mmcfg_start/size, >>> now, so Linux inmates are left to PIO-only access to PCI >>> configuration >>> space, right? PCI manual says: >> >> The fact that we do not yet provide our inmates a clue where the >> MMCFG >> space is located doesn't change the fact that it is already >> accessible >> to them. I think we can and should fix that former aspect soon by >> communicating the location via the comm region of the cell and >> setup_data of the boot loader and then extending the pv patches for >> Linux to call pci_mmconfig_insert on x86. > > I have some patches using the comm region to export the the pci > mmconfig base address. I'll send them following this email. But this > won't help much on handling AERs on inmates because the inmate Linux > would still need access to the PCI root port to get AER irqs. > > Any opinions on how to attack Hardware Error Handling in Jailhouse? > Should we try to capture AER irqs and find a way to forward them?
If the AER IRQs can't be partitioned along device ownership, we will have to catch and reinject them, I suppose. How complex could this be? Would we have to emulate a root port for non-root cells then? In any case, exploring the thing along this line seems reasonable. Is there a way to emulate AER events? Hmm, QEMU seems to have some logic for that (monitor command "pcie_aer_inject_error"). Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
