* Jan Kiszka <[email protected]> [2017-09-14 08:29:30 +0200]:

[...]

> >> continue to hit the root cell, even if they originate from a functions
> >> that no longer belongs to it.
> >>
> >> On the other hand, denying write access alone does not solve the issue
> >> either unless we enforce disabling of AER on device hand-over. Where you
> >> seeing intercepted write accesses during Jailhouse operation?
> > 
> > Yes, we were, when splitting some network adaptors (PCI functions).
> > Otavio says the errors come even if the drivers for those adaptors are
> > not present on both root/inmate Linux instances, what is funny. The
> > errors seem to be non-recoverable and non-fatal. Problem is the
> > partitioning can't even be made without support for the capability.
> > 
> > I'm new to this (PCI-handling) field in Jailhouse, so I might need
> > some light. I took a quick look at
> > https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf to see
> > at least some implementation direction on AERs. Now, how TODO entry
> > (in Jailhouse) "PCI AER", under "Hardware error handling" should be
> > attacked?
> 
> An important but yet unaddressed topic for Jailhouse is error discovery,
> reporting or even recovery, specifically when going safe. AER is
> apparently the feature that allows you diagnose PCI problems. If this
> feature can be cleanly partitioned at function level, and no hypervisor
> interaction is required during runtime, we can probably leave the task
> up to the cell itself. If not, the hypervisor should eventually take
> care of catching and forwarding those reports.

So more on this...

If the root cell is configured with CONFIG_PCIEAER=y, its .remove()
function will be issued at the time Jailhouse claims root complex
devices, which leads to aer_disable_rootport(). That will walk all
child functions, writing to their PCIe capability (Device Control
Register), disabling error reporting for all of them.

This is the first issue, once the pristine access flag for this
capability on the config creating tool is RO.

It then proceeds to write on the root complex' AER capability
registers Root Error Command/Status. Again, this capability is
currently RO from Jailhouse.

Making this latter capability RW at the config creation tool would be
harmless IMO, since everything there seems to be specific to the
function and, moreover, Linux inmates won't be able to access PCIe
Extended Capabilities anyway.

Is my last statement correct? Because from what I see,
jailhouse_setup_data does not pass anything about mmcfg_start/size,
now, so Linux inmates are left to PIO-only access to PCI configuration
space, right? PCI manual says:

"PCI Express Extended Capability registers are located in Configuration
Space at offsets 256 or greater as shown in Figure 7-30 or in the Root
Complex Register Block (RCRB). These registers when located in the
Configuration Space are accessible using only the PCI Express Extended
Configuration Space flat memory-mapped access mechanism."

and line

                address = (addr_port_val & PCI_ADDR_REGNUM_MASK) +
                        port - PCI_REG_DATA_PORT;


at x86/pci.c, makes it clear that a 256 bit mask is indeed the case
for those accesses.

This also makes it clear that what was bugging us on our PCI
partitioning is not AER events generated at partitioning time, but
those from the PCIAER driver at the root cell, as there's not even a
way now for Linux inmates to flag AER errors (via PIO).

One last interesting thing is that after our partitioning, dmesg on
the root cell shows something like:

[  258.819503] pcieport 0000:00:17.0: AER: Multiple Uncorrected (Non-Fatal) 
error received: id=0000
[  258.819533] pcieport 0000:00:17.0: can't find device of ID0000
[  258.819816] Created Jailhouse cell "linux-x86-demo"

If I get this right, the PCIAER driver again (at
aer_disable_rootport()) is calling flush_work() on the aer_isr
routine, and it seems there was activity there during root-cell only
runtime.

As a WIP change, we have that patch making the config creation tool to
properly size (and make RW) the AER capability.

RW access to the PCIe one could indeed be left to a manual step to the
user (I'd document that better in the code, maybe). I went on to code
something disabling AER reporting by Jailhouse and ignoring writes to
the Control register by cells while it was up, but when I saw inmates
would not even be able to sign AERs I quit that.

Thoughts/corrections to that?

-- 
Gustavo Lima Chaves
Intel - Open Source Technology Center

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to