Hi Alex,

Yes, I hope kernel can disable SR-IOV and related VF resource allocation if the 
system BIOS is not SR-IOV capable.

Adding the parameter "pci=nosriov" sounds a doable solution, but it would need 
user to add this parameter manually, right? I think an automatic detection 
would be better. My patch is trying to auto detect and bypass VF resource 
allocation.


-Collins Cheng


-----Original Message-----
From: Alexander Duyck [mailto:alexander.du...@gmail.com] 
Sent: Friday, May 19, 2017 11:44 PM
To: Alex Williamson <alex.william...@redhat.com>
Cc: Cheng, Collins <collins.ch...@amd.com>; Bjorn Helgaas 
<bhelg...@google.com>; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
Deucher, Alexander <alexander.deuc...@amd.com>; Zytaruk, Kelly 
<kelly.zyta...@amd.com>; Yinghai Lu <ying...@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV 
incapable platform

On Mon, May 15, 2017 at 10:53 AM, Alex Williamson <alex.william...@redhat.com> 
wrote:
> On Mon, 15 May 2017 08:19:28 +0000
> "Cheng, Collins" <collins.ch...@amd.com> wrote:
>
>> Hi Williamson,
>>
>> We cannot assume BIOS supports SR-IOV, actually only newer server 
>> motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older 
>> server motherboard BIOS doesn't support SR-IOV. This issue would happen if 
>> an user plugs our AMD SR-IOV capable GPU card to a normal desktop 
>> motherboard.
>
> Servers should be supporting SR-IOV for a long time now.  What really 
> is there to a BIOS supporting SR-IOV anyway, it's simply reserving 
> sufficient bus number and MMIO resources such that we can enable the 
> VFs.  This process isn't exclusively reserved for the BIOS.  Some 
> platforms may choose to only initialize boot devices, leaving the rest 
> for the OS to program.  The initial proposal here to disable SR-IOV if 
> not programmed at OS hand-off disables even the possibility of the OS 
> reallocating resources for this device.

There are differences between supporting SR-IOV and supporting SR-IOV on 
devices with massive resources. I know I have seen NICs that will keep a system 
from completing POST if SR-IOV is enabled, and MMIO beyond 4G is not. My guess 
would be that the issues being seen are probably that they disable SR-IOV in 
the BIOS in such a setup and end up running into issues when they try to boot 
into the Linux kernel as it goes through and tries to allocate resources for 
SR-IOV even though it was disabled in the BIOS.

It might make sense to add a kernel parameter something like a "pci=nosriov" 
that would allow for disabling SR-IOV and related resource allocation if that 
is what we are talking about. That way you could plug in these types of devices 
into a system with a legacy bios or that doesn't wan to allocate addresses 
above 32b for MMIO, and this parameter would be all that is needed to disable 
SR-IOV so you could plug in a NIC that has SR-IOV associated with it.

>> I agree that failure to allocate VF resources should leave the device in no 
>> worse condition than before it tried. I hope kernel could allocate PF device 
>> resource before allocating VF device resource, and keep PF device resource 
>> valid and functional if failed to allocate VF device resource.
>>
>> I will send out dmesg log lspci info tomorrow. Thanks.
>
> Thanks,
> Alex
>
>> -----Original Message-----
>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>> Sent: Friday, May 12, 2017 10:43 PM
>> To: Cheng, Collins <collins.ch...@amd.com>
>> Cc: Bjorn Helgaas <bhelg...@google.com>; linux-...@vger.kernel.org; 
>> linux-kernel@vger.kernel.org; Deucher, Alexander 
>> <alexander.deuc...@amd.com>; Zytaruk, Kelly <kelly.zyta...@amd.com>; 
>> Yinghai Lu <ying...@kernel.org>
>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
>> SR-IOV incapable platform
>>
>> On Fri, 12 May 2017 04:51:43 +0000
>> "Cheng, Collins" <collins.ch...@amd.com> wrote:
>>
>> > Hi Williamson,
>> >
>> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV 
>> > NIC. I don't think it is redundant to check the VF BAR valid before call 
>> > sriov_init(), it is safe and saving boot time, also there is no a better 
>> > method to know if system BIOS has correctly initialized the SR-IOV 
>> > capability or not.
>>
>> It also masks an underlying bug and creates a maintenance issue that we 
>> won't know when it's safe to remove this workaround.  I don't think faster 
>> boot is valid rationale, in one case SR-IOV is completely disabled, the 
>> other we attempt to allocate the resources the BIOS failed to provide.  I 
>> expect this is also a corner case, the BIOS should typically support SR-IOV, 
>> therefore this situation should be an exception.
>>
>> > I did not try to fix the issue from the kernel resource allocation 
>> > perspective, it is because:
>> >     1. I am not very familiar with the PCI resource allocation scheme in 
>> > kernel. For example, in sriov_init(), kernel will re-assign the PCI 
>> > resource for both VF and PF. I don't understand why kernel allocates 
>> > resource for VF firstly, then PF. If it is PF firstly, then this issue 
>> > could be avoided.
>> >     2. I am not sure if kernel has error handler if PCI resource 
>> > allocation failed. In this case, kernel cannot allocate enough resource to 
>> > PF. It should trigger some error handler to either just keep original BAR 
>> > values set by system BIOS, or disable this device and log errors.
>>
>> I think these are the issues we should be trying to solve and I'm 
>> sure folks on the linux-pci list can help us identify the bug.  
>> Minimally, failure to allocate VF resources should leave the device 
>> in no worse condition than before it tried.  Perhaps you could post 
>> more details about the issue, boot with pci=earlydump, post dmesg of 
>> a boot where the PF resources are incorrectly re-allocated, and 
>> include lspci -vvv for the SR-IOV device.  Also, please test with the 
>> latest upstream kernel, upstream only patches old kernels through 
>> stable backports of commits to the latest kernel.  Adding Yinghai as 
>> a resource allocation expert. Thanks,
>>
>> Alex
>>
>> > -----Original Message-----
>> > From: Alex Williamson [mailto:alex.william...@redhat.com]
>> > Sent: Friday, May 12, 2017 12:01 PM
>> > To: Cheng, Collins <collins.ch...@amd.com>
>> > Cc: Bjorn Helgaas <bhelg...@google.com>; linux-...@vger.kernel.org; 
>> > linux-kernel@vger.kernel.org; Deucher, Alexander 
>> > <alexander.deuc...@amd.com>; Zytaruk, Kelly <kelly.zyta...@amd.com>
>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
>> > SR-IOV incapable platform
>> >
>> > On Fri, 12 May 2017 03:42:46 +0000
>> > "Cheng, Collins" <collins.ch...@amd.com> wrote:
>> >
>> > > Hi Williamson,
>> > >
>> > > GPU card needs more BAR aperture resource than other PCI devices. For 
>> > > example, Intel SR-IOV network card only require 512KB memory resource 
>> > > for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory 
>> > > resource for frame buffer BAR aperture.
>> > >
>> > > If the system BIOS supports SR-IOV, it will reserve enough resource for 
>> > > all VF BARs. If the system BIOS doesn't support SR-IOV or cannot 
>> > > allocate the enough resource for VF BARs, only PF BAR will be assigned 
>> > > and VF BARs are empty. Then system boots to Linux kernel and kernel 
>> > > doesn't check if the VF BARs are empty or valid. Kernel will re-assign 
>> > > the BAR resources for PF and all VFs. The problem I saw is that kernel 
>> > > will fail to allocate PF BAR resource because some resources are 
>> > > assigned to VF, this is not expected. So kernel might need to do some 
>> > > check before re-assign the PF/VF resource, so that PF device will be 
>> > > correctly assigned BAR resource and user can use PF device.
>> >
>> > So the problem is that something bad happens when the kernel is 
>> > trying to reallocate resources in order to fulfill the requirements 
>> > of the VFs, leaving the PF resources incorrectly programmed?  Why 
>> > not just fix that bug rather than creating special handling for 
>> > this vendor/class of device which disables any attempt to fixup 
>> > resources for SR-IOV?  IOW, this patch just avoids the problem for 
>> > your devices rather than fixing the bug.  I'd suggest fixing the 
>> > bug such that the PF is left in a functional state if the kernel is 
>> > unable to allocate sufficient resources for the VFs.  Thanks,
>> >
>> > Alex
>> >
>> > > -----Original Message-----
>> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
>> > > Sent: Friday, May 12, 2017 11:21 AM
>> > > To: Cheng, Collins <collins.ch...@amd.com>
>> > > Cc: Bjorn Helgaas <bhelg...@google.com>; 
>> > > linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, 
>> > > Alexander <alexander.deuc...@amd.com>; Zytaruk, Kelly 
>> > > <kelly.zyta...@amd.com>
>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
>> > > SR-IOV incapable platform
>> > >
>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins" 
>> > > <collins.ch...@amd.com> wrote:
>> > >
>> > > > Hi Helgaas,
>> > > >
>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
>> > > > platform. It would cause a problem on PCI resource allocation 
>> > > > in current Linux kernel.
>> > > >
>> > > > Therefore in order to allow the PF (Physical Function) device 
>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable platform, 
>> > > > it is required to verify conditions for initializing BAR 
>> > > > resources on AMD SR-IOV capable GPUs.
>> > > >
>> > > > If the device is an AMD graphics device and it supports SR-IOV 
>> > > > it will require a large amount of resources.
>> > > > Before calling sriov_init() must ensure that the system BIOS 
>> > > > also supports SR-IOV and that system BIOS has been able to 
>> > > > allocate enough resources.
>> > > > If the VF BARs are zero then the system BIOS does not support 
>> > > > SR-IOV or it could not allocate the resources and this platform 
>> > > > will not support AMD graphics SR-IOV.
>> > > > Therefore do not call sriov_init().
>> > > > If the system BIOS does support SR-IOV then the VF BARs will be 
>> > > > properly initialized to non-zero values.
>> > > >
>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
>> > > >
>> > > > I checked the drivers/pci/quirks.c, it looks the 
>> > > > workarounds/fixes in quirks.c are for specific devices and one 
>> > > > or more device ID are defined for the specific devices. However 
>> > > > my patch is for all AMD SR-IOV capable GPUs, that includes all 
>> > > > existing and future AMD server GPUs.
>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.
>> > >
>> > >
>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init() 
>> > > always need to be able to deal with devices of any type where the 
>> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV 
>> > > devices can fit their VFs within a minimum bridge aperture, most 
>> > > cannot.  I don't understand why the VF resource requirements 
>> > > being exceptionally large dictates that they receive special handling.
>> > > Thanks,
>> > >
>> > > Alex
>> > >
>> > > > Signed-off-by: Collins Cheng <collins.ch...@amd.com>
>> > > > ---
>> > > >  drivers/pci/iov.c | 63
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
>> > > >
>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
>> > > > e30f05c..e4f1405 100644
>> > > > --- a/drivers/pci/iov.c
>> > > > +++ b/drivers/pci/iov.c
>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev 
>> > > > *dev)
>> > > >                 msleep(100);
>> > > >  }
>> > > >
>> > > > +/*
>> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
>> > > > + * @dev: the PCI device
>> > > > + * @pos: register offset of SR-IOV capability in PCI config 
>> > > > +space
>> > > > + * Returns true any VF BAR has resource allocated, false
>> > > > + * if all VF BARs are empty.
>> > > > + */
>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
>> > > > +       int i;
>> > > > +       u32 bar_value;
>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |
>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);
>> > > > +
>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 
>> > > > 4, &bar_value);
>> > > > +               if (bar_value & bar_size_mask)
>> > > > +                       return true;
>> > > > +       }
>> > > > +
>> > > > +       return false;
>> > > > +}
>> > > > +
>> > > > +/*
>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU 
>> > > > +device
>> > > > + * @dev: the PCI device
>> > > > + *
>> > > > + * Returns true if device is an AMD/ATI display adapter,
>> > > > + * otherwise return false.
>> > > > + */
>> > > > +
>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||
>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }
>> > > > +
>> > > >  /**
>> > > >   * pci_iov_init - initialize the IOV capability
>> > > >   * @dev: the PCI device
>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>> > > >                 return -ENODEV;
>> > > >
>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> > > > -       if (pos)
>> > > > -               return sriov_init(dev, pos);
>> > > > -
>> > > > +       if (pos) {
>> > > > +       /*
>> > > > +        * If the device is an AMD graphics device and it supports
>> > > > +        * SR-IOV it will require a large amount of resources.
>> > > > +        * Before calling sriov_init() must ensure that the system
>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been
>> > > > +        * able to allocate enough resources.
>> > > > +        * If the VF BARs are zero then the system BIOS does not
>> > > > +        * support SR-IOV or it could not allocate the resources
>> > > > +        * and this platform will not support AMD graphics SR-IOV.
>> > > > +        * Therefore do not call sriov_init().
>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs
>> > > > +        * will be properly initialized to non-zero values.
>> > > > +        */
>> > > > +               if (is_amd_display_adapter(dev)) {
>> > > > +                       if (pci_vf_bar_valid(dev, pos))
>> > > > +                               return sriov_init(dev, pos);
>> > > > +               } else {
>> > > > +                       return sriov_init(dev, pos);
>> > > > +               }
>> > > > +       }
>> > > >         return -ENODEV;
>> > > >  }
>> > > >
>> > >
>> >
>>
>

Reply via email to