On 2017-09-14 02:28, Gustavo Lima Chaves wrote: > * Jan Kiszka <[email protected]> [2017-08-31 13:24:20 +0000]: > >> On 2017-08-31 02:15, Otavio Pontes wrote: >>> Use length defined by PCI specification for AER (Advanced Error >>> Reporting) capability. This is relevant if PCI device is assigned for an >>> inmate cell and this capability is accessed. >>> >>> Capability is set as RW, because there are relevant fields that are >>> read-write. >>> --- >>> tools/jailhouse-config-create | 12 ++++++++++++ >>> 1 file changed, 12 insertions(+) >>> >>> diff --git a/tools/jailhouse-config-create b/tools/jailhouse-config-create >>> index ce2affc..7fd37ab 100755 >>> --- a/tools/jailhouse-config-create >>> +++ b/tools/jailhouse-config-create >>> @@ -226,6 +226,7 @@ class PCICapability: >>> def parse_pcicaps(dir): >>> caps = [] >>> has_extended_caps = False >>> + tlp_prefix_supported = False >>> f = input_open(os.path.join(dir, 'config'), 'rb') >>> f.seek(0x06) >>> (status,) = struct.unpack('<H', f.read(2)) >>> @@ -259,6 +260,11 @@ class PCICapability: >>> (cap_reg,) = struct.unpack('<H', f.read(2)) >>> if (cap_reg & 0xf) >= 2: # v2 capability >>> len = 60 >>> + f.seek(cap + 0x24) >>> + (dev_cap2,) = struct.unpack('<xB', f.read(2)) >>> + if dev_cap2 & 0x20 != 0: >> >> Why not reading the whole cap registers (32 bits) and doing the test >> with a mask that derives from the spec (1 << 20)? This is very hard to >> read, and I would say it's even incorrect (you are testing bit 13, no?). >> >>> + tlp_prefix_supported = True >>> + >>> # access side effects still need to be analyzed >>> flags = PCICapability.RD >>> has_extended_caps = True >>> @@ -297,6 +303,12 @@ class PCICapability: >>> len = 64 >>> # access side effects still need to be analyzed >>> flags = PCICapability.RD >>> + elif id == 0x0001: # AER >>> + if tlp_prefix_supported: >>> + len = 0x48 >>> + else: >>> + len = 0x38 >> >> len = 0x48 if tlp_prefix_supported else 0x38 >> >>> + flags = PCICapability.RW >> >> Reading the spec, AER seems to be per-function, indeed. However, the >> question remains what will the error reports cause. We don't have >> infrastructure to catch them in the hypervisor, so they will most likely >> continue to hit the root cell, even if they originate from a functions >> that no longer belongs to it. >> >> On the other hand, denying write access alone does not solve the issue >> either unless we enforce disabling of AER on device hand-over. Where you >> seeing intercepted write accesses during Jailhouse operation? > > Yes, we were, when splitting some network adaptors (PCI functions). > Otavio says the errors come even if the drivers for those adaptors are > not present on both root/inmate Linux instances, what is funny. The > errors seem to be non-recoverable and non-fatal. Problem is the > partitioning can't even be made without support for the capability. > > I'm new to this (PCI-handling) field in Jailhouse, so I might need > some light. I took a quick look at > https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf to see > at least some implementation direction on AERs. Now, how TODO entry > (in Jailhouse) "PCI AER", under "Hardware error handling" should be > attacked?
An important but yet unaddressed topic for Jailhouse is error discovery, reporting or even recovery, specifically when going safe. AER is apparently the feature that allows you diagnose PCI problems. If this feature can be cleanly partitioned at function level, and no hypervisor interaction is required during runtime, we can probably leave the task up to the cell itself. If not, the hypervisor should eventually take care of catching and forwarding those reports. > Is "disabling of AER on device hand-over" something all PCI > devices support by spec? To may understanding, each device complying with the spec will have a well-defined way to control this, without having to know all other device details. > The fact that the root bridge will always > stay in the root cell is challenging as well... Yes, we may need some effort on bridge or root hub sides if those reports are shared between devices of different cells. That would definitely be a classic hypervisor job then. > > I'll keep updating myself on those issues at my side too. > > Thanks a lot. > Any work on this would be very much appreciated! Thanks, Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
