On 11/30/2016 09:46 AM, Michael S. Tsirkin wrote:
> On Mon, Nov 28, 2016 at 05:32:15PM +0800, Cao jin wrote:
>>
>>

>>>
>>>> +  if (severity == AER_FATAL && strcmp(dev->driver->name, "vfio-pci")) {
>>>
>>> You really want some flag in the device, or something similar.
>>> Also, how do we know driver is not going away at this point?
>>>
>>
>> I didn't think of this condition, and I don't quite follow how would driver
>> go away?(device has error happened, then is removed?)
> 
> Yes - hotplug request detected. Does something prevent this?
> 

I wasn't realized there would be possible to have hotplug happened
during error recovery. But, it seems is the aer driver's thing to
consider it?

>>>>            result = reset_link(dev);
>>>>            if (result != PCI_ERS_RESULT_RECOVERED)
>>>>                    goto failed;
>>
>>>> @@ -1187,10 +1200,30 @@ static pci_ers_result_t 
>>>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>>>            return PCI_ERS_RESULT_DISCONNECT;
>>>>    }
>>>>
>>>> +  /* get device's uncorrectable error status as soon as possible,
>>>> +   * and signal it to user space. The later we read it, the possibility
>>>> +   * the register value is mangled grows. */
>>>> +  aer_cap_offset = pci_find_ext_capability(vdev->pdev, 
>>>> PCI_EXT_CAP_ID_ERR);
>>>> +  ret = pci_read_config_dword(vdev->pdev, aer_cap_offset +
>>>> +                                    PCI_ERR_UNCOR_STATUS, &uncor_status);
>>>> +        if (ret)
>>>> +                return PCI_ERS_RESULT_DISCONNECT;
>>>> +
>>>> +  pr_err("device %d got AER detect notification. uncorrectable error 
>>>> status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
>>>>    mutex_lock(&vdev->igate);
>>>> +
>>>> +  vdev->aer_recovering = true;
>>>> +  reinit_completion(&vdev->aer_error_completion);
>>>> +
>>>> +  /* suspend config space access from user space,
>>>> +   * when vfio-pci's error recovery process is on */
>>>
>>> what about access to memory etc? Do you need to suspend this as well?
>>>
>>
>> Yes, this question came into my mind a little bit, but I didn't see some
>> existing APIs like pci_cfg_access_xxx which can help to do this.(I am still
>> not familiar with kernel)
> 
> This isn't easy to do at all.
> 

I guess we can use completion in vfio_pci_rw to block all kinds of
access during vfio's error recovery. But from another perspective, now
that we have disabled reset_link in host, I think the access to device
could be performed normally. To me, the benefit of using pci_cfg_access
is: we won't bother to do "wait 3s to do guest's link reset"

> 
>>>> +  pci_cfg_access_trylock(vdev->pdev);
>>>
>>> If you trylock, you need to handle failure.
>>
>> try lock returns 0 if access is already locked, 1 otherwise. Is it necessary
>> to check its return value?
> 
> Locked by whom? You blissfully access as if it's locked by you.
> 
>>
>> -- 
>> Sincerely,
>> Cao jin
>>
> 
> 
> .
> 

-- 
Sincerely,
Cao jin


Reply via email to