On 02/22/2016 05:01 AM, Marcel Apfelbaum wrote:
Hi,
I am sorry again for the noise, I resend the mail from the appropriate
mail address.


On 02/22/2016 09:58 AM, Ni, Ruiyu wrote:
 > Marcel, Laszlo,

Hi,

 > I went back to read the PciHostBridgeDxe driver in OvmfPkg and
 > below is my understanding to this driver's behavior:
 > The driver reads QEMU config "etc/extra-pci-roots" and promotes
 > bus from #1 to #extra-pci-roots to root bridges. Supposing there are
 > 10 buses and extra-pci-roots is 3, the bus #1, #2, #3 are promoted to
 > root bridge #1 #2 and #3 while the other buses are still behind main
 > bus #0.

Laszlo implemented it and he can provide more information, but I can say
the other buses will not always be behind the main bus #0.

The way it works is:
  - scans bus #0 and all the buses behind it (by searching for PCI bridges)
  - once the first PCI hierarchy is finished, if  extra-pci-roots > 0
continues to search
    for other PCI roots (until it finds all extra-pci-roots)
  - for every extra PCI root scans again all the buses behind it.

So we can have actually secondary buses on the other PCI root buses as
well.


 >
 > I am thinking if we change the PciHostBridgeDxe driver to only
 > expose one root bridge (main bus), what it will break?
 >
 > The behavior of PciHostBridgeDxe to whether install multiple
 > root bridges or single root bridge doesn't impact OS behavior.
 > OS doesn't query the DXE core protocol database to find
 > all the root bridge IO instances. So why not we just simply the
 > driver to expose one root bridge covering the main bus?
 >

I'll try to rephrase the question in order to be sure I understand it.
"Why do we need the extra PCI roots at all if they are in the same PCI
domain
  and share the same resources?"

The short answer is that one PCI root can be associated by the OSes
with only one NUMA node.

Now to the long answer:
What happens if we have a VM with memory/cpus from multiple host NUMA nodes
and we want to assign a PCI device from one of the host NUMA nodes?
The only way we can associate this device with the correct NUMA node is
by putting
it behind a PCI root bridge in the proximity of that NUMA node, otherwise
the performance will greatly suffer.

The above is also true for bare metal machines, I looked again and found
this machine
having this kind of configuration:

System:
     IBM System x3550 M4 Server

lspci -vt:
  -+-[0000:ff]-+-08.0  Intel Corporation Xeon E5/Core i7 QPI Link 0
  |           +-08.2  Intel Corporation Device 3c41
             [...]
  |           +-13.5  Intel Corporation Xeon E5/Core i7 Ring to
QuickPath Interconnect Link 0 Performance Monitor
  |           \-13.6  Intel Corporation Xeon E5/Core i7 Ring to
QuickPath Interconnect Link 1 Performance Monitor
  +-[0000:80]-+-00.0-[81-85]--
  |           +-02.0-[86-8a]--
  |           [...]
  |           +-05.0  Intel Corporation Xeon E5/Core i7 Address Map,
VTd_Misc, System Management
  |           \-05.2  Intel Corporation Xeon E5/Core i7 Control Status
and Global Errors
  +-[0000:7f]-+-08.0  Intel Corporation Xeon E5/Core i7 QPI Link 0
  |           +-08.2  Intel Corporation Device 3c41
  |           +-08.3  Intel Corporation Xeon E5/Core i7 QPI Link Reut 0
  |           [...]
  |           +-13.5  Intel Corporation Xeon E5/Core i7 Ring to
QuickPath Interconnect Link 0 Performance Monitor
  |           \-13.6  Intel Corporation Xeon E5/Core i7 Ring to
QuickPath Interconnect Link 1 Performance Monitor
  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
              +-01.0-[0c-10]--
              +-02.0-[11-15]--+-00.0  Intel Corporation 82599ES
10-Gigabit SFI/SFP+ Network Connection
              |               \-00.1  Intel Corporation 82599ES
10-Gigabit SFI/SFP+ Network Connection
              [...]


iasl DSDT:


[...]
     Name (\BBI0, 0x00000000)
     Name (\BBI1, 0x00000080)
[...]

  Scope (\_SB)
  {
  [...]
     Device (IOH0)
         {
             Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  //
_HID: Hardware ID
             Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID:
Compatible ID
             Name (_UID, 0x00)  // _UID: Unique ID
             Method (_BBN, 0, NotSerialized)  // _BBN: BIOS Bus Number
             {
                 Return (BBI0) /* \BBI0 */
             }
             [...]
             Name (PBR0, ResourceTemplate ()
             {
                 WordBusNumber (ResourceProducer, MinFixed, MaxFixed,
PosDecode,
                     0x0000,             // Granularity
                     0x0000,             // Range Minimum
                     0x007F,             // Range Maximum
                     0x0000,             // Translation Offset
                     0x0080,             // Length
                     ,, )
                 IO (Decode16,
                     0x0CF8,             // Range Minimum
                     0x0CF8,             // Range Maximum
                     0x01,               // Alignment
                     0x08,               // Length
                     )
                 WordIO (ResourceProducer, MinFixed, MaxFixed,
PosDecode, EntireRange,
                     0x0000,             // Granularity
                     0x0000,             // Range Minimum
                     0x0CF7,             // Range Maximum
                     0x0000,             // Translation Offset
                     0x0CF8,             // Length
                     ,, , TypeStatic)
                 WordIO (ResourceProducer, MinFixed, MaxFixed,
PosDecode, EntireRange,
                     0x0000,             // Granularity
                     0x1000,             // Range Minimum
                     0xBFFF,             // Range Maximum
                     0x0000,             // Translation Offset
                     0xB000,             // Length
                     ,, , TypeStatic)
                [...]
             }
          /* the above range will be part of CRS after some logic */
         [...]
        }
         Device (IOH1)
         {
             Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  //
_HID: Hardware ID
             Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID:
Compatible ID
             Name (_UID, 0x01)  // _UID: Unique ID
             Method (_BBN, 0, NotSerialized)  // _BBN: BIOS Bus Number
             {
                 Return (BBI1) /* \BBI1 */
             }
             [...]
             Name (PBR0, ResourceTemplate ()
             {
                 WordBusNumber (ResourceProducer, MinFixed, MaxFixed,
PosDecode,
                     0x0000,             // Granularity
                     0x0080,             // Range Minimum
                     0x00FF,             // Range Maximum
                     0x0000,             // Translation Offset
                     0x0080,             // Length
                     ,, )
                 WordIO (ResourceProducer, MinFixed, MaxFixed,
PosDecode, EntireRange,
                     0x0000,             // Granularity
                     0xC000,             // Range Minimum
                     0xFFFF,             // Range Maximum
                     0x0000,             // Translation Offset
                     0x4000,             // Length
                     ,, , TypeStatic)
             }
[...]

As you can see we have multiple PCI roots sharing the PCI domain 0
resources.
I found this configuration quite common in the machines I work with.
Those machines have BIOS and not the UEFI firmware, but I really think
the edk2 will benefit from being compatible with the above.

I hope I helped understanding the issue,
Marcel


Here's another example of a bare metal machine with multiple PCI roots, although they do not share resources (SGI UV1000, edited for brevity):

-+-[1005:3f]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1 | +-00.2 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 2 | +-00.4 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 3 | +-00.6 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 4 | +-01.0 Intel Corporation Xeon Processor E7 Product Family Power Controller | +-02.0 Intel Corporation Xeon Processor E7 Product Family Caching Agent 0 | +-03.0 Intel Corporation Xeon Processor E7 Product Family Caching Agent 1 | +-04.0 Intel Corporation Xeon Processor E7 Product Family QPI Home Agent 0 | +-05.0 Intel Corporation Xeon Processor E7 Product Family Memory Controller 0a | +-05.2 Intel Corporation Xeon Processor E7 Product Family Memory Controller 0b | +-05.4 Intel Corporation Xeon Processor E7 Product Family Memory Controller 0c | +-06.0 Intel Corporation Xeon Processor E7 Product Family QPI Home Agent 1 | +-07.0 Intel Corporation Xeon Processor E7 Product Family Memory Controller 1a | +-07.2 Intel Corporation Xeon Processor E7 Product Family Memory Controller 1b | +-07.4 Intel Corporation Xeon Processor E7 Product Family Memory Controller 1c | +-08.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 0 | +-09.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 1 | +-0a.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 2 | +-0b.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 3 | +-0c.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 4 | +-0d.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 5 | +-0e.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 6 | +-0f.0 Intel Corporation Xeon Processor E7 Product Family Last Level Cache Coherence Engine 7 | +-10.0 Intel Corporation Xeon Processor E7 Product Family QPI Router Port 0-1 | +-10.2 Intel Corporation Xeon Processor E7 Product Family QPI Router Port 2-3
 |           +-10.4  Intel Corporation Device 2b32
 |           +-10.6  Intel Corporation Device 2b3a
| +-11.0 Intel Corporation Xeon Processor E7 Product Family QPI Router Port 4-5 | +-11.2 Intel Corporation Xeon Processor E7 Product Family QPI Router Port 6-7
 |           +-11.4  Intel Corporation Device 2b36
 |           +-11.6  Intel Corporation Device 2b3e
| +-12.0 Intel Corporation Xeon Processor E7 Product Family Test and Debug 0 | +-13.0 Intel Corporation Xeon Processor E7 Product Family Test and Debug 1 | +-14.0 Intel Corporation Xeon Processor E7 Product Family QPI Physical Port 0: REUT control/status | +-14.2 Intel Corporation Xeon Processor E7 Product Family QPI Physical Port 0: Misc. control/status | +-15.0 Intel Corporation Xeon Processor E7 Product Family QPI Physical Port 1: REUT control/status | +-15.2 Intel Corporation Xeon Processor E7 Product Family QPI Physical Port 1: Misc. control/status | +-16.0 Intel Corporation Xeon Processor E7 Product Family QPI Physical Port 2: REUT control/status | +-16.2 Intel Corporation Xeon Processor E7 Product Family QPI Physical Port 2: Misc. control/status | +-18.0 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 0: REUT control/status | +-18.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 0: Misc control/status | +-19.0 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: REUT control/status | \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1005:3e]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1004:3f]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1004:3e]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1003:3f]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1003:3e]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1002:3f]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1002:3e]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1001:3f]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1001:3e]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1000:3f]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status +-[1000:3e]-+-00.0 Intel Corporation Xeon Processor E7 Product Family System Configuration Controller 1
 |           + ...
| \-19.2 Intel Corporation Xeon Processor E7 Product Family SMI Physical Port 1: Misc control/status
 +-[0001:00]-+-00.0  Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
| +-03.0-[01-02]----00.0 LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] | +-07.0-[03]----00.0 LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS
 |           +-0d.0  Intel Corporation Device 343a
 |           +-0d.1  Intel Corporation Device 343b
 |           +-0d.2  Intel Corporation Device 343c
 |           +-0d.3  Intel Corporation Device 343d
 |           +-0d.4  Intel Corporation 5520/5500/X58 Physical Layer Port 0
 |           +-0d.5  Intel Corporation 5520/5500 Physical Layer Port 1
 |           +-0d.6  Intel Corporation Device 341a
 |           +-0d.7  Intel Corporation Device 341b
 |           +-0e.0  Intel Corporation Device 341c
 |           +-0e.1  Intel Corporation Device 341d
 |           +-0e.2  Intel Corporation Device 341e
 |           +-0e.3  Intel Corporation Device 341f
 |           +-0e.4  Intel Corporation Device 3439
| +-10.0 Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 | +-10.1 Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 | +-11.0 Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 | +-11.1 Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 | +-13.0 Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller | +-14.0 Intel Corporation 5520/5500/X58 I/O Hub System Management Registers | +-14.1 Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers | +-14.2 Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers | +-14.3 Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers | +-16.0 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | +-16.1 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | +-16.2 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | +-16.3 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | +-16.4 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | +-16.5 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | +-16.6 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device | \-16.7 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device
 \-[0000:00]-+-00.0  Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
+-01.0-[01-02]--+-00.0 Intel Corporation 82576 Gigabit Network Connection | \-00.1 Intel Corporation 82576 Gigabit Network Connection +-03.0-[03]----00.0 LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS
             +-0d.0  Intel Corporation Device 343a
             +-0d.1  Intel Corporation Device 343b
             +-0d.2  Intel Corporation Device 343c
             +-0d.3  Intel Corporation Device 343d
             +-0d.4  Intel Corporation 5520/5500/X58 Physical Layer Port 0
             +-0d.5  Intel Corporation 5520/5500 Physical Layer Port 1
             +-0d.6  Intel Corporation Device 341a
             +-0d.7  Intel Corporation Device 341b
             +-0e.0  Intel Corporation Device 341c
             +-0e.1  Intel Corporation Device 341d
             +-0e.2  Intel Corporation Device 341e
             +-0e.3  Intel Corporation Device 341f
             +-0e.4  Intel Corporation Device 3439
+-10.0 Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 +-10.1 Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 +-11.0 Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 +-11.1 Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 +-13.0 Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller +-14.0 Intel Corporation 5520/5500/X58 I/O Hub System Management Registers +-14.1 Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers +-14.2 Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers +-14.3 Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers +-16.0 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.1 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.2 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.3 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.4 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.5 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.6 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.7 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-1a.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 +-1a.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 +-1a.2 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 +-1a.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 +-1c.0-[04]----00.0 Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) +-1d.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
             +-1e.0-[05]--
+-1f.0 Intel Corporation 82801JIR (ICH10R) LPC Interface Controller \-1f.3 Intel Corporation 82801JI (ICH10 Family) SMBus Controller

On UV machines, only the legacy socket on segment 0 (implemented by host bridge 0, and containing the southbridge) is addressable in 32-bit PCIe config space. The other sockets each get two segments at high addresses, one for I/O and one for socket internal devices (memory controllers, etc.) Each socket is allocated a disjoint 32-bit MMIO range, 64-bit MMIO range, and I/O port range as needed.

We'd like the generic PCI code to have full support for complex topologies like this. For example:
- Assume the presence of dozens if not hundreds of segments and root bridges
- Don't store segment or root bridge lists in small, fixed-size arrays or bitmaps - Use PciSegmentLib rather than PciExpressLib, since the latter supports only one segment
- Never use PciCf8Lib, for the same reason

I haven't looked at the generic PciHostBridgeDxe driver in this patch series to know if you do that or not... So please don't take this message as criticism. I'm just providing another report from the "real" world.

Thanks,
Brian



 >
 > Regards,
 > Ray
 >
 >
 >> -----Original Message-----
 >> From: Marcel Apfelbaum [mailto:marcel.apfelb...@gmail.com]
 >> Sent: Monday, February 8, 2016 6:56 PM
 >> To: Ni, Ruiyu <ruiyu...@intel.com>; Laszlo Ersek <ler...@redhat.com>
 >> Cc: Justen, Jordan L <jordan.l.jus...@intel.com>;
edk2-de...@ml01.01.org;
 >> Tian, Feng <feng.t...@intel.com>; Fan, Jeff <jeff....@intel.com>
 >> Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic
 >> PciHostBridgeDxe driver.
 >>
 >> Hi,
 >>
 >> I am sorry for the noise, I am re-sending this mail from an e-mail
address
 >> subscribed to the list.
 >>
 >> Thanks,
 >> Marcel
 >>
 >> On 02/08/2016 12:41 PM, Marcel Apfelbaum wrote:
 >>> On 02/06/2016 09:09 AM, Ni, Ruiyu wrote:
 >>>> Marcel,
 >>>> Please see my reply embedded below.
 >>>>
 >>>> On 2016-02-02 19:07, Laszlo Ersek wrote:
 >>>>> On 02/01/16 16:07, Marcel Apfelbaum wrote:
 >>>>>> On 01/26/2016 07:17 AM, Ni, Ruiyu wrote:
 >>>>>>> Laszlo,
 >>>>>>> I now understand your problem.
 >>>>>>> Can you tell me why OVMF needs multiple root bridges support?
 >>>>>>> My understanding to OVMF is it's a firmware which can be used in a
 >>>>>>> guest VM
 >>>>>>> environment to boot OS.
 >>>>>>> Multiple root bridges requirement currently mainly comes from
 >> high-end
 >>>>>>> servers.
 >>>>>>> Do you mean that the VM guest needs to be like a high-end server?
 >>>>>>> This may help me to think about the possible solution to your
problem.
 >>>>>> Hi Ray,
 >>>>>>
 >>>>>> Laszlo's explanation is very good, this is not exactly about
high-end VMs,
 >>>>>> we need the extra root bridges to match assigned devices to their
 >>>>>> corresponding NUMA node.
 >>>>>>
 >>>>>> Regarding the OVMF issue, the main problem is that the extra root
 >>>>>> bridges are created dynamically
 >>>>>> for the VMs (command line parameter) and their resources are
 >> computed on
 >>>>>> the fly.
 >>>>>>
 >>>>>> Not directly related to the above, the optimal way to allocate
resources
 >>>>>> for PCI root bridges
 >>>>>> sharing the same PCI domain is to sort devices MEM/IO ranges
from the
 >>>>>> biggest to smallest
 >>>>>> and use this order during allocation.
 >>>>>>
 >>>>>> After the resources allocation is finished we can build the CRS
for each
 >>>>>> PCI root bridge
 >>>>>> and pass it back to firmware/OS.
 >>>>>>
 >>>>>> While for "real" machines we can hard-code the root bridge
resources in
 >>>>>> some ROM and have it
 >>>>>> extracted early in the boot process, for the VM world this would
not be
 >>>>>> possible. Also
 >>>>>> any effort to divide the resources range before the resource
allocation
 >>>>>> would be odd and far from optimal.
 >>
 >> Hi Ray,
 >> Thank you for your response,
 >>
 >>>> Real machine uses hard-code resources for root bridges. But when the
 >> resource
 >>>> cannot meet certain root bridges' requirement, firmware can save
the real
 >> resource
 >>>> requirement per root bridges to NV storage and divide the
resources to
 >> each root
 >>>> bridge in next boot according to the NV settings.
 >>>> The MMIO/IO routine in the real machine I mentioned above needs to be
 >> fixed
 >>>> in a very earlier phase before the PciHostBridgeDxe driver runs.
That's to
 >> say if
 >>>> [2G, 2.8G) is configured to route to root bridge #1, only [2G,
2.8G) is
 >> allowed to
 >>>> assigned to root bride #1.  And the routine cannot be changed
unless a
 >> platform
 >>>> reset is performed.
 >>
 >> I understand.
 >>
 >>>>
 >>>> Based on your description, it sounds like all the root bridges in
OVMF share
 >> the
 >>>> same range of resource and any MMIO/IO in the range can be route
to any
 >> root
 >>>> bridge. For example, every root bridge can use [2G, 3G) MMIO.
 >>>
 >>> Exactly. This is true for "snooping" host-bridges which do not have
their own
 >>> configuration registers (or MMConfig region). They are sniffing
host-bridge
 >> 0
 >>> for configuration cycles and if the are meant for a device on a bus
number
 >>> owned by them, they will forward the transaction to their primary
root bus.
 >>>
 >>> Until in
 >>>> allocation phase, root bridge #1 is assigned to [2G, 2.8G), #2 is
assigned
 >>>> to [2.8G, 2.9G), #3 is assigned to [2.9G, 3G).
 >>
 >> Correct, but the regions do not have to be disjoint in the above
scenario.
 >> root bridge #1 can have [2G,2.4G) and [2.8,3G) while root bridge #1
can have
 >> [2.4,2.8).
 >>
 >> This is so the firmware can distribute the resources in an optimal
way. An
 >> example can be:
 >>     - root bridge #1 has a PCI device A with a huge BAR and a PCI
device B
 >> with a little BAR.
 >>     - root bridge #2 has  aPCI device C with a medium BAR.
 >> The best way to distribute resources over [2G, 3G) is A BAR, C BAR,
and only
 >> then B BAR.
 >>
 >>>> So it seems that we need a way to tell PciHostBridgeDxe driver
from the
 >> PciHostBridgeLib
 >>>> that all resources are sharable among all root bridges.
 >>
 >> This is exactly what we need, indeed.
 >>
 >>>>
 >>>> The real platform case is the allocation per root bridge and OVMF
case is
 >> the allocation
 >>>> per PCI domain.
 >>
 >> Indeed, bare metal servers use different PCI domain per host bridge,
but I've
 >> actually seen
 >> real servers that have multiple root bridges sharing the same PCI
domain, 0.
 >>
 >>
 >>>> Is my understanding correct?
 >>
 >> It is, and thank you for taking your time to understand the issue,
 >> Marcel
 >>
 >>>>
 >>> [...]


_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


--

                                                Brian J. Johnson

--------------------------------------------------------------------

  My statements are my own, are not authorized by SGI, and do not
  necessarily represent SGI’s positions.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to