Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Alex Williamson
On Thu, 6 Apr 2017 16:53:44 +0800 Cao jin wrote: > On 04/06/2017 06:36 AM, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: > >> On Thu, 6 Apr 2017 00:50:22 +0300 > >> "Michael S. Tsirkin" wrote: > >> >

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Alex Williamson
On Thu, 6 Apr 2017 16:53:44 +0800 Cao jin wrote: > On 04/06/2017 06:36 AM, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: > >> On Thu, 6 Apr 2017 00:50:22 +0300 > >> "Michael S. Tsirkin" wrote: > >> > >>> On Wed, Apr 05, 2017 at 01:38:22PM

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Alex Williamson
On Thu, 6 Apr 2017 16:49:35 +0800 Cao jin wrote: > On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: > >> Apparently, I don't have experience to induce non-fatal error, device > >> error is more of a chance

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Alex Williamson
On Thu, 6 Apr 2017 16:49:35 +0800 Cao jin wrote: > On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: > >> Apparently, I don't have experience to induce non-fatal error, device > >> error is more of a chance related with the

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin
On 04/06/2017 03:38 AM, Alex Williamson wrote: > On Wed, 5 Apr 2017 16:54:33 +0800 > Cao jin wrote: > >> Sorry for late. Distracted by other problem for a while. >> >> On 03/31/2017 02:16 AM, Alex Williamson wrote: >>> On Thu, 30 Mar 2017 21:00:35 +0300 >>

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin
On 04/06/2017 03:38 AM, Alex Williamson wrote: > On Wed, 5 Apr 2017 16:54:33 +0800 > Cao jin wrote: > >> Sorry for late. Distracted by other problem for a while. >> >> On 03/31/2017 02:16 AM, Alex Williamson wrote: >>> On Thu, 30 Mar 2017 21:00:35 +0300 >> > >

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin
On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: >> Apparently, I don't have experience to induce non-fatal error, device >> error is more of a chance related with the environment(temperature, >> humidity, etc) as I understand. > > I'm

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin
On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: >> Apparently, I don't have experience to induce non-fatal error, device >> error is more of a chance related with the environment(temperature, >> humidity, etc) as I understand. > > I'm

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin
On 04/06/2017 06:36 AM, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: >> On Thu, 6 Apr 2017 00:50:22 +0300 >> "Michael S. Tsirkin" wrote: >> >>> On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: The previous

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin
On 04/06/2017 06:36 AM, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: >> On Thu, 6 Apr 2017 00:50:22 +0300 >> "Michael S. Tsirkin" wrote: >> >>> On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: The previous intention of trying

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Alex Williamson
On Thu, 6 Apr 2017 01:36:31 +0300 "Michael S. Tsirkin" wrote: > On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: > > On Thu, 6 Apr 2017 00:50:22 +0300 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Alex Williamson
On Thu, 6 Apr 2017 01:36:31 +0300 "Michael S. Tsirkin" wrote: > On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: > > On Thu, 6 Apr 2017 00:50:22 +0300 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > > > > The

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: > On Thu, 6 Apr 2017 00:50:22 +0300 > "Michael S. Tsirkin" wrote: > > > On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > > > The previous intention of trying to handle all sorts of AER faults > >

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote: > On Thu, 6 Apr 2017 00:50:22 +0300 > "Michael S. Tsirkin" wrote: > > > On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > > > The previous intention of trying to handle all sorts of AER faults > > > clearly had

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Alex Williamson
On Thu, 6 Apr 2017 00:50:22 +0300 "Michael S. Tsirkin" wrote: > On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > > The previous intention of trying to handle all sorts of AER faults > > clearly had more value, though even there the implementation and > >

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Alex Williamson
On Thu, 6 Apr 2017 00:50:22 +0300 "Michael S. Tsirkin" wrote: > On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > > The previous intention of trying to handle all sorts of AER faults > > clearly had more value, though even there the implementation and > > configuration

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: > Apparently, I don't have experience to induce non-fatal error, device > error is more of a chance related with the environment(temperature, > humidity, etc) as I understand. I'm not sure how to interpret this statement. I think what Alex

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: > Apparently, I don't have experience to induce non-fatal error, device > error is more of a chance related with the environment(temperature, > humidity, etc) as I understand. I'm not sure how to interpret this statement. I think what Alex

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > The previous intention of trying to handle all sorts of AER faults > clearly had more value, though even there the implementation and > configuration requirements restricted the practicality. For instance > is AER support actually

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote: > The previous intention of trying to handle all sorts of AER faults > clearly had more value, though even there the implementation and > configuration requirements restricted the practicality. For instance > is AER support actually

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Alex Williamson
On Wed, 5 Apr 2017 16:54:33 +0800 Cao jin wrote: > Sorry for late. Distracted by other problem for a while. > > On 03/31/2017 02:16 AM, Alex Williamson wrote: > > On Thu, 30 Mar 2017 21:00:35 +0300 > > >> > >>> > >>> I also asked in my previous

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Alex Williamson
On Wed, 5 Apr 2017 16:54:33 +0800 Cao jin wrote: > Sorry for late. Distracted by other problem for a while. > > On 03/31/2017 02:16 AM, Alex Williamson wrote: > > On Thu, 30 Mar 2017 21:00:35 +0300 > > >> > >>> > >>> I also asked in my previous comments to provide examples

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Cao jin
Sorry for late. Distracted by other problem for a while. On 03/31/2017 02:16 AM, Alex Williamson wrote: > On Thu, 30 Mar 2017 21:00:35 +0300 >> >>> >>> I also asked in my previous comments to provide examples of errors that >>> might trigger correctable errors to the user,

Re: [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Cao jin
Sorry for late. Distracted by other problem for a while. On 03/31/2017 02:16 AM, Alex Williamson wrote: > On Thu, 30 Mar 2017 21:00:35 +0300 >> >>> >>> I also asked in my previous comments to provide examples of errors that >>> might trigger correctable errors to the user,

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-30 Thread Alex Williamson
On Thu, 30 Mar 2017 21:00:35 +0300 "Michael S. Tsirkin" wrote: > On Tue, Mar 28, 2017 at 08:55:13PM -0600, Alex Williamson wrote: > > On Wed, 29 Mar 2017 03:01:48 +0300 > > "Michael S. Tsirkin" wrote: > > > > > On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-30 Thread Alex Williamson
On Thu, 30 Mar 2017 21:00:35 +0300 "Michael S. Tsirkin" wrote: > On Tue, Mar 28, 2017 at 08:55:13PM -0600, Alex Williamson wrote: > > On Wed, 29 Mar 2017 03:01:48 +0300 > > "Michael S. Tsirkin" wrote: > > > > > On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > > > > On

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-30 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 08:55:13PM -0600, Alex Williamson wrote: > On Wed, 29 Mar 2017 03:01:48 +0300 > "Michael S. Tsirkin" wrote: > > > On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > > > On Tue, 28 Mar 2017 21:47:00 +0800 > > > Cao jin

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-30 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 08:55:13PM -0600, Alex Williamson wrote: > On Wed, 29 Mar 2017 03:01:48 +0300 > "Michael S. Tsirkin" wrote: > > > On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > > > On Tue, 28 Mar 2017 21:47:00 +0800 > > > Cao jin wrote: > > > > > > > On 03/25/2017

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Alex Williamson
On Wed, 29 Mar 2017 03:01:48 +0300 "Michael S. Tsirkin" wrote: > On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > > On Tue, 28 Mar 2017 21:47:00 +0800 > > Cao jin wrote: > > > > > On 03/25/2017 06:12 AM, Alex Williamson wrote: >

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Alex Williamson
On Wed, 29 Mar 2017 03:01:48 +0300 "Michael S. Tsirkin" wrote: > On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > > On Tue, 28 Mar 2017 21:47:00 +0800 > > Cao jin wrote: > > > > > On 03/25/2017 06:12 AM, Alex Williamson wrote: > > > > On Thu, 23 Mar 2017 17:07:31 +0800 >

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > On Tue, 28 Mar 2017 21:47:00 +0800 > Cao jin wrote: > > > On 03/25/2017 06:12 AM, Alex Williamson wrote: > > > On Thu, 23 Mar 2017 17:07:31 +0800 > > > Cao jin wrote: > > > >

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Michael S. Tsirkin
On Tue, Mar 28, 2017 at 10:12:33AM -0600, Alex Williamson wrote: > On Tue, 28 Mar 2017 21:47:00 +0800 > Cao jin wrote: > > > On 03/25/2017 06:12 AM, Alex Williamson wrote: > > > On Thu, 23 Mar 2017 17:07:31 +0800 > > > Cao jin wrote: > > > > > > A more appropriate patch subject would be: > > >

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Alex Williamson
On Tue, 28 Mar 2017 21:47:00 +0800 Cao jin wrote: > On 03/25/2017 06:12 AM, Alex Williamson wrote: > > On Thu, 23 Mar 2017 17:07:31 +0800 > > Cao jin wrote: > > > > A more appropriate patch subject would be: > > > > vfio-pci: Report

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Alex Williamson
On Tue, 28 Mar 2017 21:47:00 +0800 Cao jin wrote: > On 03/25/2017 06:12 AM, Alex Williamson wrote: > > On Thu, 23 Mar 2017 17:07:31 +0800 > > Cao jin wrote: > > > > A more appropriate patch subject would be: > > > > vfio-pci: Report correctable errors and slot reset events to user > > > >

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Cao jin
On 03/25/2017 06:12 AM, Alex Williamson wrote: > On Thu, 23 Mar 2017 17:07:31 +0800 > Cao jin wrote: > > A more appropriate patch subject would be: > > vfio-pci: Report correctable errors and slot reset events to user > Correctable? It is confusing to me.

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Cao jin
On 03/25/2017 06:12 AM, Alex Williamson wrote: > On Thu, 23 Mar 2017 17:07:31 +0800 > Cao jin wrote: > > A more appropriate patch subject would be: > > vfio-pci: Report correctable errors and slot reset events to user > Correctable? It is confusing to me. Correctable error has its clear

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-24 Thread Alex Williamson
On Thu, 23 Mar 2017 17:07:31 +0800 Cao jin wrote: A more appropriate patch subject would be: vfio-pci: Report correctable errors and slot reset events to user > From: "Michael S. Tsirkin" This hardly seems accurate anymore. You could say

Re: [PATCH v6] vfio error recovery: kernel support

2017-03-24 Thread Alex Williamson
On Thu, 23 Mar 2017 17:07:31 +0800 Cao jin wrote: A more appropriate patch subject would be: vfio-pci: Report correctable errors and slot reset events to user > From: "Michael S. Tsirkin" This hardly seems accurate anymore. You could say Suggested-by and let Michael add a sign-off, but it's

[PATCH v6] vfio error recovery: kernel support

2017-03-23 Thread Cao jin
From: "Michael S. Tsirkin" 0. What happens now (PCIE AER only) Fatal errors cause a link reset. Non fatal errors don't. All errors stop the QEMU guest eventually, but not immediately, because it's detected and reported asynchronously. Interrupts are forwarded as

[PATCH v6] vfio error recovery: kernel support

2017-03-23 Thread Cao jin
From: "Michael S. Tsirkin" 0. What happens now (PCIE AER only) Fatal errors cause a link reset. Non fatal errors don't. All errors stop the QEMU guest eventually, but not immediately, because it's detected and reported asynchronously. Interrupts are forwarded as usual. Correctable