On 02/22/16 18:21, Brian J. Johnson wrote: > Here's another example of a bare metal machine with multiple PCI roots, > although they do not share resources (SGI UV1000, edited for brevity):
[snip an incredible amount of devices] Does this supercomputer fit in a van? :) > On UV machines, only the legacy socket on segment 0 (implemented by host > bridge 0, and containing the southbridge) is addressable in 32-bit PCIe > config space. The other sockets each get two segments at high > addresses, one for I/O and one for socket internal devices (memory > controllers, etc.) Each socket is allocated a disjoint 32-bit MMIO > range, 64-bit MMIO range, and I/O port range as needed. > > We'd like the generic PCI code to have full support for complex > topologies like this. For example: > - Assume the presence of dozens if not hundreds of segments and root > bridges The new central driver (that we're porting OVMF to), i.e., MdeModulePkg/Bus/Pci/PciHostBridgeDxe appears to install only one instance of EFI_PCI_HOST_BRIDGE_RESOURCE_ALLOCATION_PROTOCOL but I think it shouldn't necessarily be a problem. The code has a comment saying Most systems in the world including complex servers have only one Host Bridge If that doesn't apply to (all) SGI machines, I *think* the driver can be generalized later, incrementally. And, as far as the assumption that 1 instance of EFI_PCI_HOST_BRIDGE_RESOURCE_ALLOCATION_PROTOCOL suffices is valid, your requirement above seems to be covered. Namely, PciHostBridgeDxe delegates the reporting of all root bridges to PciHostBridgeLib, which the platform can provide. The important API in that lib class is PciHostBridgeGetRootBridges(), which is supposed to return a dynamically allocated array of PCI_ROOT_BRIDGE structures, describing the various types of apertures that each root bridge has. The PCI_ROOT_BRIDGE struct has a field called Segment, and it should be possible to return an array that has dozens of this struct. Please see "MdeModulePkg/Include/Library/PciHostBridgeLib.h". > - Don't store segment or root bridge lists in small, fixed-size arrays > or bitmaps The PciHostBridgeGetRootBridges() API already returns a dynamically allocated array (and matching element count), so this should be covered. > - Use PciSegmentLib rather than PciExpressLib, since the latter supports > only one segment Done. (Not that I wrote it, but still: done. :)) Please see "MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe.inf". > - Never use PciCf8Lib, for the same reason > > I haven't looked at the generic PciHostBridgeDxe driver in this patch > series to know if you do that or not... So please don't take this > message as criticism. I'm just providing another report from the "real" > world. I think (with my admittedly limited PCI "expertise", quotes justified) that the above driver and library class are a really good abstraction. (I can praise it; it's not my design. :)) For OVMF we need a few tweaks in the driver code / assumptions (about non-overlapping MMIO and ioport apertures), but otherwise it looks very promising to us. Thanks Laszlo > > Thanks, > Brian > >> >> >> > >> > Regards, >> > Ray >> > >> > >> >> -----Original Message----- >> >> From: Marcel Apfelbaum [mailto:[email protected]] >> >> Sent: Monday, February 8, 2016 6:56 PM >> >> To: Ni, Ruiyu <[email protected]>; Laszlo Ersek <[email protected]> >> >> Cc: Justen, Jordan L <[email protected]>; >> [email protected]; >> >> Tian, Feng <[email protected]>; Fan, Jeff <[email protected]> >> >> Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic >> >> PciHostBridgeDxe driver. >> >> >> >> Hi, >> >> >> >> I am sorry for the noise, I am re-sending this mail from an e-mail >> address >> >> subscribed to the list. >> >> >> >> Thanks, >> >> Marcel >> >> >> >> On 02/08/2016 12:41 PM, Marcel Apfelbaum wrote: >> >>> On 02/06/2016 09:09 AM, Ni, Ruiyu wrote: >> >>>> Marcel, >> >>>> Please see my reply embedded below. >> >>>> >> >>>> On 2016-02-02 19:07, Laszlo Ersek wrote: >> >>>>> On 02/01/16 16:07, Marcel Apfelbaum wrote: >> >>>>>> On 01/26/2016 07:17 AM, Ni, Ruiyu wrote: >> >>>>>>> Laszlo, >> >>>>>>> I now understand your problem. >> >>>>>>> Can you tell me why OVMF needs multiple root bridges support? >> >>>>>>> My understanding to OVMF is it's a firmware which can be used >> in a >> >>>>>>> guest VM >> >>>>>>> environment to boot OS. >> >>>>>>> Multiple root bridges requirement currently mainly comes from >> >> high-end >> >>>>>>> servers. >> >>>>>>> Do you mean that the VM guest needs to be like a high-end >> server? >> >>>>>>> This may help me to think about the possible solution to your >> problem. >> >>>>>> Hi Ray, >> >>>>>> >> >>>>>> Laszlo's explanation is very good, this is not exactly about >> high-end VMs, >> >>>>>> we need the extra root bridges to match assigned devices to their >> >>>>>> corresponding NUMA node. >> >>>>>> >> >>>>>> Regarding the OVMF issue, the main problem is that the extra root >> >>>>>> bridges are created dynamically >> >>>>>> for the VMs (command line parameter) and their resources are >> >> computed on >> >>>>>> the fly. >> >>>>>> >> >>>>>> Not directly related to the above, the optimal way to allocate >> resources >> >>>>>> for PCI root bridges >> >>>>>> sharing the same PCI domain is to sort devices MEM/IO ranges >> from the >> >>>>>> biggest to smallest >> >>>>>> and use this order during allocation. >> >>>>>> >> >>>>>> After the resources allocation is finished we can build the CRS >> for each >> >>>>>> PCI root bridge >> >>>>>> and pass it back to firmware/OS. >> >>>>>> >> >>>>>> While for "real" machines we can hard-code the root bridge >> resources in >> >>>>>> some ROM and have it >> >>>>>> extracted early in the boot process, for the VM world this would >> not be >> >>>>>> possible. Also >> >>>>>> any effort to divide the resources range before the resource >> allocation >> >>>>>> would be odd and far from optimal. >> >> >> >> Hi Ray, >> >> Thank you for your response, >> >> >> >>>> Real machine uses hard-code resources for root bridges. But when >> the >> >> resource >> >>>> cannot meet certain root bridges' requirement, firmware can save >> the real >> >> resource >> >>>> requirement per root bridges to NV storage and divide the >> resources to >> >> each root >> >>>> bridge in next boot according to the NV settings. >> >>>> The MMIO/IO routine in the real machine I mentioned above needs >> to be >> >> fixed >> >>>> in a very earlier phase before the PciHostBridgeDxe driver runs. >> That's to >> >> say if >> >>>> [2G, 2.8G) is configured to route to root bridge #1, only [2G, >> 2.8G) is >> >> allowed to >> >>>> assigned to root bride #1. And the routine cannot be changed >> unless a >> >> platform >> >>>> reset is performed. >> >> >> >> I understand. >> >> >> >>>> >> >>>> Based on your description, it sounds like all the root bridges in >> OVMF share >> >> the >> >>>> same range of resource and any MMIO/IO in the range can be route >> to any >> >> root >> >>>> bridge. For example, every root bridge can use [2G, 3G) MMIO. >> >>> >> >>> Exactly. This is true for "snooping" host-bridges which do not have >> their own >> >>> configuration registers (or MMConfig region). They are sniffing >> host-bridge >> >> 0 >> >>> for configuration cycles and if the are meant for a device on a bus >> number >> >>> owned by them, they will forward the transaction to their primary >> root bus. >> >>> >> >>> Until in >> >>>> allocation phase, root bridge #1 is assigned to [2G, 2.8G), #2 is >> assigned >> >>>> to [2.8G, 2.9G), #3 is assigned to [2.9G, 3G). >> >> >> >> Correct, but the regions do not have to be disjoint in the above >> scenario. >> >> root bridge #1 can have [2G,2.4G) and [2.8,3G) while root bridge #1 >> can have >> >> [2.4,2.8). >> >> >> >> This is so the firmware can distribute the resources in an optimal >> way. An >> >> example can be: >> >> - root bridge #1 has a PCI device A with a huge BAR and a PCI >> device B >> >> with a little BAR. >> >> - root bridge #2 has aPCI device C with a medium BAR. >> >> The best way to distribute resources over [2G, 3G) is A BAR, C BAR, >> and only >> >> then B BAR. >> >> >> >>>> So it seems that we need a way to tell PciHostBridgeDxe driver >> from the >> >> PciHostBridgeLib >> >>>> that all resources are sharable among all root bridges. >> >> >> >> This is exactly what we need, indeed. >> >> >> >>>> >> >>>> The real platform case is the allocation per root bridge and OVMF >> case is >> >> the allocation >> >>>> per PCI domain. >> >> >> >> Indeed, bare metal servers use different PCI domain per host bridge, >> but I've >> >> actually seen >> >> real servers that have multiple root bridges sharing the same PCI >> domain, 0. >> >> >> >> >> >>>> Is my understanding correct? >> >> >> >> It is, and thank you for taking your time to understand the issue, >> >> Marcel >> >> >> >>>> >> >>> [...] >> >> >> _______________________________________________ >> edk2-devel mailing list >> [email protected] >> https://lists.01.org/mailman/listinfo/edk2-devel > > _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

