Marcel,
I see two requirements from your mail:
1. non-continuously resources: root bridge #1 uses [2G, 2.4G) and [2.8G, 3G) 
while
  root bridge #2 uses [2.4G, 2.8G)
2. sharable resources among root bridges: All root bridges in same pci segment
  can share one common range of resources.

Requirement #1 is not supported by MdeModulePkg/PciBus driver and I guess
it's not the urgent requirement and doesn't block OVMF PciHostBridge porting.

Requirement #2 can be interpreted as it's valid when the resources claimed by
different root bridges overlap. No matter which segment they belong to.
        
The overlap can be like root bridge #1 claims [2G, 2.4G) while root bridge #2
claims [2.2G, 2.6G) -- [2.2G, 2.4G) is shared by both root bridges.
In such case, PCI devices under root bridge #1 can only use resources [2G, 2.4G)
and root bridge #2 can only use [2.2G, 2.6G). GCD services can guarantee there
is no resource conflict -- if [2.2G, 2.3G) is used by one device under root 
bridge #1,
it won't be used by device under root bridge #2.

An extreme case is both root bridges claim [2G, 3G) which is the OVMF case.

So the change to PciHostBridgeDxe can be:
1. Checks whether the resources claimed by the root bridges are already added,
and call AddMemorySpace/AddIoSpace for those resource ranges which haven't
been added.
2. Call AllocateMemorySpace/AllocateIoSpace to occupy these resources in GCD.
The Allocation shouldn't fail, otherwise it's a fatal error and 
PciHostBridgeDxe driver
will assert and exit.

Regards,
Ray


>-----Original Message-----
>From: edk2-devel [mailto:[email protected]] On Behalf Of
>Marcel Apfelbaum
>Sent: Monday, February 22, 2016 7:02 PM
>To: Ni, Ruiyu <[email protected]>; Laszlo Ersek <[email protected]>
>Cc: Justen, Jordan L <[email protected]>; [email protected];
>Tian, Feng <[email protected]>; Fan, Jeff <[email protected]>
>Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic
>PciHostBridgeDxe driver.
>
>Hi,
>I am sorry again for the noise, I resend the mail from the appropriate mail
>address.
>
>
>On 02/22/2016 09:58 AM, Ni, Ruiyu wrote:
> > Marcel, Laszlo,
>
>Hi,
>
> > I went back to read the PciHostBridgeDxe driver in OvmfPkg and
> > below is my understanding to this driver's behavior:
> > The driver reads QEMU config "etc/extra-pci-roots" and promotes
> > bus from #1 to #extra-pci-roots to root bridges. Supposing there are
> > 10 buses and extra-pci-roots is 3, the bus #1, #2, #3 are promoted to
> > root bridge #1 #2 and #3 while the other buses are still behind main
> > bus #0.
>
>Laszlo implemented it and he can provide more information, but I can say
>the other buses will not always be behind the main bus #0.
>
>The way it works is:
>  - scans bus #0 and all the buses behind it (by searching for PCI bridges)
>  - once the first PCI hierarchy is finished, if  extra-pci-roots > 0 
> continues to
>search
>    for other PCI roots (until it finds all extra-pci-roots)
>  - for every extra PCI root scans again all the buses behind it.
>
>So we can have actually secondary buses on the other PCI root buses as well.
>
>
> >
> > I am thinking if we change the PciHostBridgeDxe driver to only
> > expose one root bridge (main bus), what it will break?
> >
> > The behavior of PciHostBridgeDxe to whether install multiple
> > root bridges or single root bridge doesn't impact OS behavior.
> > OS doesn't query the DXE core protocol database to find
> > all the root bridge IO instances. So why not we just simply the
> > driver to expose one root bridge covering the main bus?
> >
>
>I'll try to rephrase the question in order to be sure I understand it.
>"Why do we need the extra PCI roots at all if they are in the same PCI domain
>  and share the same resources?"
>
>The short answer is that one PCI root can be associated by the OSes
>with only one NUMA node.
>
>Now to the long answer:
>What happens if we have a VM with memory/cpus from multiple host NUMA
>nodes
>and we want to assign a PCI device from one of the host NUMA nodes?
>The only way we can associate this device with the correct NUMA node is by
>putting
>it behind a PCI root bridge in the proximity of that NUMA node, otherwise
>the performance will greatly suffer.
>
>The above is also true for bare metal machines, I looked again and found this
>machine
>having this kind of configuration:
>
>System:
>     IBM System x3550 M4 Server
>
>lspci -vt:
>  -+-[0000:ff]-+-08.0  Intel Corporation Xeon E5/Core i7 QPI Link 0
>  |           +-08.2  Intel Corporation Device 3c41
>             [...]
>  |           +-13.5  Intel Corporation Xeon E5/Core i7 Ring to
>QuickPath Interconnect Link 0 Performance Monitor
>  |           \-13.6  Intel Corporation Xeon E5/Core i7 Ring to
>QuickPath Interconnect Link 1 Performance Monitor
>  +-[0000:80]-+-00.0-[81-85]--
>  |           +-02.0-[86-8a]--
>  |           [...]
>  |           +-05.0  Intel Corporation Xeon E5/Core i7 Address Map,
>VTd_Misc, System Management
>  |           \-05.2  Intel Corporation Xeon E5/Core i7 Control Status
>and Global Errors
>  +-[0000:7f]-+-08.0  Intel Corporation Xeon E5/Core i7 QPI Link 0
>  |           +-08.2  Intel Corporation Device 3c41
>  |           +-08.3  Intel Corporation Xeon E5/Core i7 QPI Link Reut 0
>  |           [...]
>  |           +-13.5  Intel Corporation Xeon E5/Core i7 Ring to
>QuickPath Interconnect Link 0 Performance Monitor
>  |           \-13.6  Intel Corporation Xeon E5/Core i7 Ring to
>QuickPath Interconnect Link 1 Performance Monitor
>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>              +-01.0-[0c-10]--
>              +-02.0-[11-15]--+-00.0  Intel Corporation 82599ES
>10-Gigabit SFI/SFP+ Network Connection
>              |               \-00.1  Intel Corporation 82599ES
>10-Gigabit SFI/SFP+ Network Connection
>              [...]
>
>
>iasl DSDT:
>
>
>[...]
>     Name (\BBI0, 0x00000000)
>     Name (\BBI1, 0x00000080)
>[...]
>
>  Scope (\_SB)
>  {
>  [...]
>     Device (IOH0)
>         {
>             Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  //
>_HID: Hardware ID
>             Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID:
>Compatible ID
>             Name (_UID, 0x00)  // _UID: Unique ID
>             Method (_BBN, 0, NotSerialized)  // _BBN: BIOS Bus Number
>             {
>                 Return (BBI0) /* \BBI0 */
>             }
>             [...]
>             Name (PBR0, ResourceTemplate ()
>             {
>                 WordBusNumber (ResourceProducer, MinFixed,
>MaxFixed, PosDecode,
>                     0x0000,             // Granularity
>                     0x0000,             // Range Minimum
>                     0x007F,             // Range Maximum
>                     0x0000,             // Translation Offset
>                     0x0080,             // Length
>                     ,, )
>                 IO (Decode16,
>                     0x0CF8,             // Range Minimum
>                     0x0CF8,             // Range Maximum
>                     0x01,               // Alignment
>                     0x08,               // Length
>                     )
>                 WordIO (ResourceProducer, MinFixed, MaxFixed,
>PosDecode, EntireRange,
>                     0x0000,             // Granularity
>                     0x0000,             // Range Minimum
>                     0x0CF7,             // Range Maximum
>                     0x0000,             // Translation Offset
>                     0x0CF8,             // Length
>                     ,, , TypeStatic)
>                 WordIO (ResourceProducer, MinFixed, MaxFixed,
>PosDecode, EntireRange,
>                     0x0000,             // Granularity
>                     0x1000,             // Range Minimum
>                     0xBFFF,             // Range Maximum
>                     0x0000,             // Translation Offset
>                     0xB000,             // Length
>                     ,, , TypeStatic)
>                [...]
>             }
>          /* the above range will be part of CRS after some logic */
>         [...]
>        }
>         Device (IOH1)
>         {
>             Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  //
>_HID: Hardware ID
>             Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID:
>Compatible ID
>             Name (_UID, 0x01)  // _UID: Unique ID
>             Method (_BBN, 0, NotSerialized)  // _BBN: BIOS Bus Number
>             {
>                 Return (BBI1) /* \BBI1 */
>             }
>             [...]
>             Name (PBR0, ResourceTemplate ()
>             {
>                 WordBusNumber (ResourceProducer, MinFixed,
>MaxFixed, PosDecode,
>                     0x0000,             // Granularity
>                     0x0080,             // Range Minimum
>                     0x00FF,             // Range Maximum
>                     0x0000,             // Translation Offset
>                     0x0080,             // Length
>                     ,, )
>                 WordIO (ResourceProducer, MinFixed, MaxFixed,
>PosDecode, EntireRange,
>                     0x0000,             // Granularity
>                     0xC000,             // Range Minimum
>                     0xFFFF,             // Range Maximum
>                     0x0000,             // Translation Offset
>                     0x4000,             // Length
>                     ,, , TypeStatic)
>             }
>[...]
>
>As you can see we have multiple PCI roots sharing the PCI domain 0
>resources.
>I found this configuration quite common in the machines I work with.
>Those machines have BIOS and not the UEFI firmware, but I really think
>the edk2 will benefit from being compatible with the above.
>
>I hope I helped understanding the issue,
>Marcel
>
>
>
> >
> > Regards,
> > Ray
> >
> >
> >> -----Original Message-----
> >> From: Marcel Apfelbaum [mailto:[email protected]]
> >> Sent: Monday, February 8, 2016 6:56 PM
> >> To: Ni, Ruiyu <[email protected]>; Laszlo Ersek <[email protected]>
> >> Cc: Justen, Jordan L <[email protected]>;
>[email protected];
> >> Tian, Feng <[email protected]>; Fan, Jeff <[email protected]>
> >> Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic
> >> PciHostBridgeDxe driver.
> >>
> >> Hi,
> >>
> >> I am sorry for the noise, I am re-sending this mail from an e-mail address
> >> subscribed to the list.
> >>
> >> Thanks,
> >> Marcel
> >>
> >> On 02/08/2016 12:41 PM, Marcel Apfelbaum wrote:
> >>> On 02/06/2016 09:09 AM, Ni, Ruiyu wrote:
> >>>> Marcel,
> >>>> Please see my reply embedded below.
> >>>>
> >>>> On 2016-02-02 19:07, Laszlo Ersek wrote:
> >>>>> On 02/01/16 16:07, Marcel Apfelbaum wrote:
> >>>>>> On 01/26/2016 07:17 AM, Ni, Ruiyu wrote:
> >>>>>>> Laszlo,
> >>>>>>> I now understand your problem.
> >>>>>>> Can you tell me why OVMF needs multiple root bridges support?
> >>>>>>> My understanding to OVMF is it's a firmware which can be used in a
> >>>>>>> guest VM
> >>>>>>> environment to boot OS.
> >>>>>>> Multiple root bridges requirement currently mainly comes from
> >> high-end
> >>>>>>> servers.
> >>>>>>> Do you mean that the VM guest needs to be like a high-end server?
> >>>>>>> This may help me to think about the possible solution to your
>problem.
> >>>>>> Hi Ray,
> >>>>>>
> >>>>>> Laszlo's explanation is very good, this is not exactly about high-end
>VMs,
> >>>>>> we need the extra root bridges to match assigned devices to their
> >>>>>> corresponding NUMA node.
> >>>>>>
> >>>>>> Regarding the OVMF issue, the main problem is that the extra root
> >>>>>> bridges are created dynamically
> >>>>>> for the VMs (command line parameter) and their resources are
> >> computed on
> >>>>>> the fly.
> >>>>>>
> >>>>>> Not directly related to the above, the optimal way to allocate
>resources
> >>>>>> for PCI root bridges
> >>>>>> sharing the same PCI domain is to sort devices MEM/IO ranges from
>the
> >>>>>> biggest to smallest
> >>>>>> and use this order during allocation.
> >>>>>>
> >>>>>> After the resources allocation is finished we can build the CRS for
>each
> >>>>>> PCI root bridge
> >>>>>> and pass it back to firmware/OS.
> >>>>>>
> >>>>>> While for "real" machines we can hard-code the root bridge
>resources in
> >>>>>> some ROM and have it
> >>>>>> extracted early in the boot process, for the VM world this would not
>be
> >>>>>> possible. Also
> >>>>>> any effort to divide the resources range before the resource
>allocation
> >>>>>> would be odd and far from optimal.
> >>
> >> Hi Ray,
> >> Thank you for your response,
> >>
> >>>> Real machine uses hard-code resources for root bridges. But when the
> >> resource
> >>>> cannot meet certain root bridges' requirement, firmware can save the
>real
> >> resource
> >>>> requirement per root bridges to NV storage and divide the resources to
> >> each root
> >>>> bridge in next boot according to the NV settings.
> >>>> The MMIO/IO routine in the real machine I mentioned above needs to
>be
> >> fixed
> >>>> in a very earlier phase before the PciHostBridgeDxe driver runs. That's
>to
> >> say if
> >>>> [2G, 2.8G) is configured to route to root bridge #1, only [2G, 2.8G) is
> >> allowed to
> >>>> assigned to root bride #1.  And the routine cannot be changed unless
>a
> >> platform
> >>>> reset is performed.
> >>
> >> I understand.
> >>
> >>>>
> >>>> Based on your description, it sounds like all the root bridges in OVMF
>share
> >> the
> >>>> same range of resource and any MMIO/IO in the range can be route to
>any
> >> root
> >>>> bridge. For example, every root bridge can use [2G, 3G) MMIO.
> >>>
> >>> Exactly. This is true for "snooping" host-bridges which do not have their
>own
> >>> configuration registers (or MMConfig region). They are sniffing
>host-bridge
> >> 0
> >>> for configuration cycles and if the are meant for a device on a bus
>number
> >>> owned by them, they will forward the transaction to their primary root
>bus.
> >>>
> >>> Until in
> >>>> allocation phase, root bridge #1 is assigned to [2G, 2.8G), #2 is
>assigned
> >>>> to [2.8G, 2.9G), #3 is assigned to [2.9G, 3G).
> >>
> >> Correct, but the regions do not have to be disjoint in the above scenario.
> >> root bridge #1 can have [2G,2.4G) and [2.8,3G) while root bridge #1 can
>have
> >> [2.4,2.8).
> >>
> >> This is so the firmware can distribute the resources in an optimal way. An
> >> example can be:
> >>     - root bridge #1 has a PCI device A with a huge BAR and a PCI device
>B
> >> with a little BAR.
> >>     - root bridge #2 has  aPCI device C with a medium BAR.
> >> The best way to distribute resources over [2G, 3G) is A BAR, C BAR, and
>only
> >> then B BAR.
> >>
> >>>> So it seems that we need a way to tell PciHostBridgeDxe driver from
>the
> >> PciHostBridgeLib
> >>>> that all resources are sharable among all root bridges.
> >>
> >> This is exactly what we need, indeed.
> >>
> >>>>
> >>>> The real platform case is the allocation per root bridge and OVMF case
>is
> >> the allocation
> >>>> per PCI domain.
> >>
> >> Indeed, bare metal servers use different PCI domain per host bridge, but
>I've
> >> actually seen
> >> real servers that have multiple root bridges sharing the same PCI domain,
>0.
> >>
> >>
> >>>> Is my understanding correct?
> >>
> >> It is, and thank you for taking your time to understand the issue,
> >> Marcel
> >>
> >>>>
> >>> [...]
>
>
>_______________________________________________
>edk2-devel mailing list
>[email protected]
>https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to