On 20.05.14 13:56, Gavin Shan wrote:
On Tue, May 20, 2014 at 01:25:11PM +0200, Alexander Graf wrote:
On 20.05.14 10:30, Gavin Shan wrote:
If we detects frozen state on PE that has been passed to guest, we
needn't handle it. Instead, we rely on the guest to detect and recover
it. The patch avoid EEH event on the frozen passed PE so that the guest
can have chance to handle that.

Signed-off-by: Gavin Shan <gws...@linux.vnet.ibm.com>
How does the guest learn about this failure? We'd need to inject an
error into it, no?

When error is existing in HW level, 0xFF's will be turned on reading
PCI config space or memory BARs. Guest retrieves the failure state,
which is captured by HW automatically, via RTAS call
"ibm,read-slot-reset-state2" when seeing 0xFF's on reading PCI config
space or memory BARs. If "ibm,read-slot-reset-state2" reports errors in HW,
the guest kernel starts to recovery.

It can be called as "passive" reporting. There possible has one case that
the error can't be reported for ever: No device driver binding to the VFIO
PCI device and no access to device's config space and memory BARs. However,
it doesn't matter. As we don't use the device, we needn't detect and recover
the error at all.

So if the guest is waiting for an interrupt to happen it will wait forever? Not really nice.

I think what you want is an irqfd that the in-kernel eeh code
notifies when it sees a failure. When such an fd exists, the kernel
skips its own error handling.

Yeah, it's a good idea and something for me to improve in phase II. We
can discuss for more later.

I think it makes sense to at least walk into that direction immediately. The reason I brought it up in the context of this patch is that with an irqfd you wouldn't need the passed flag at all.

  For now, what I have in my head is something
like this:

       [ Host ] -> Error detected -> irqfd (or eventfd) -> QEMU
                                                            |
                                    -------------(A)---------
                                    |
                         Send one EEH event to guest kernel
                                    |
                         Guest kernel starts the recovery

(A): I didn't figure out one convienent way to do the EEH event injection yet.

How does the guest learn about errors in pHyp?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to