Marcel,
Please see my reply embedded below.

On 2016-02-02 19:07, Laszlo Ersek wrote:
On 02/01/16 16:07, Marcel Apfelbaum wrote:
On 01/26/2016 07:17 AM, Ni, Ruiyu wrote:
Laszlo,
I now understand your problem.
Can you tell me why OVMF needs multiple root bridges support?
My understanding to OVMF is it's a firmware which can be used in a
guest VM
environment to boot OS.
Multiple root bridges requirement currently mainly comes from high-end
servers.
Do you mean that the VM guest needs to be like a high-end server?
This may help me to think about the possible solution to your problem.
Hi Ray,

Laszlo's explanation is very good, this is not exactly about high-end VMs,
we need the extra root bridges to match assigned devices to their
corresponding NUMA node.

Regarding the OVMF issue, the main problem is that the extra root
bridges are created dynamically
for the VMs (command line parameter) and their resources are computed on
the fly.

Not directly related to the above, the optimal way to allocate resources
for PCI root bridges
sharing the same PCI domain is to sort devices MEM/IO ranges from the
biggest to smallest
and use this order during allocation.

After the resources allocation is finished we can build the CRS for each
PCI root bridge
and pass it back to firmware/OS.

While for "real" machines we can hard-code the root bridge resources in
some ROM and have it
extracted early in the boot process, for the VM world this would not be
possible. Also
any effort to divide the resources range before the resource allocation
would be odd and far from optimal.
Real machine uses hard-code resources for root bridges. But when the resource cannot meet certain root bridges' requirement, firmware can save the real resource requirement per root bridges to NV storage and divide the resources to each root
bridge in next boot according to the NV settings.
The MMIO/IO routine in the real machine I mentioned above needs to be fixed
in a very earlier phase before the PciHostBridgeDxe driver runs. That's to say if [2G, 2.8G) is configured to route to root bridge #1, only [2G, 2.8G) is allowed to assigned to root bride #1. And the routine cannot be changed unless a platform
reset is performed.

Based on your description, it sounds like all the root bridges in OVMF share the
same range of resource and any MMIO/IO in the range can be route to any root
bridge. For example, every root bridge can use [2G, 3G) MMIO. Until in
allocation phase, root bridge #1 is assigned to [2G, 2.8G), #2 is assigned
to [2.8G, 2.9G), #3 is assigned to [2.9G, 3G).
So it seems that we need a way to tell PciHostBridgeDxe driver from the PciHostBridgeLib
that all resources are sharable among all root bridges.

The real platform case is the allocation per root bridge and OVMF case is the allocation
per PCI domain.
Is my understanding correct?

Regarding a possible solution, I first need to understand why the
resource allocation is done
per PCI root bridge and not per PCI domain. The CRS allows a PCI root
bridge to have several MEM/IO ranges
so there is really no need to impose a per PCI root bridge logic.

I am new to the edk2 project so I might get things wrong, but I think we
need a way to specify if
the PCI root bridges will supply their resources or if an external
allocator will do the job.
Laszlo proposed solution looks like a way to implement such a policy, I
am personally OK with it.

I really think the generic PciHostBridgeDxe driver is the right way to
go also for OVMF, we just need
a way to deal with this issue.
For the sake of discussion, I pushed my WIP patches to:

https://github.com/lersek/edk2/commits/pci_host_wip

It is a faithful conversion of OvmfPkg/PciHostBridgeDxe/, to
MdeModulePkg's PciHostBridgeDxe, plus an OVMF-specific PciHostBridgeLib
instance.

The problem is of course patch

   MdeModulePkg: PciHostBridgeDxe: allow PciHostBridgeLib to prime the
   GCD map

which we've been discussing in this thread.

Thanks
Laszlo


-----Original Message-----
From: Laszlo Ersek [mailto:[email protected]]
Sent: Tuesday, January 26, 2016 11:10 AM
To: Ni, Ruiyu <[email protected]>
Cc: [email protected]; Tian, Feng <[email protected]>; Fan,
Jeff <[email protected]>; Justen, Jordan L
<[email protected]>; Marcel Apfelbaum <[email protected]>
Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic
PciHostBridgeDxe driver.

On 01/26/16 03:42, Ni, Ruiyu wrote:
Laszlo,
Thanks for the detailed explanation and I agrees with your idea that
a generic
driver should consider the case that part of the IO/MMIO resource may be
already added by platform, no matter by PEI or by DXE.

So I think we may not need the flag you proposed here. To achieve the
maximum
flexibility, we can use gDS.GetMemorySpaceMap/GetIoSpaceMap to
retrieve all
the current MMIO/IO space map and only add those ranges that are not
added.

And to be more robust, we can check that the range specified to the
root bridge
should not be allocated by anyone (CAN be added by someone).

What's your opinion?
Very good, but painful questions. :)

Until now, OVMF has relied upon PlatformPei producing the HOBs in
question (where the base of the 32-bit PCI MMIO window varies in the
HOB, based on guest RAM size), *and* OVMF's PCI host bridge driver has
never even tried to match the root bridges' 32-bit MMIO apertures to
anything at all. All of those apertures are set as [2GB, 4GB).

The idea being, the gDS->AllocateMemorySpace() calls can be satisfied
from whatever room is left from the range that the HOB identified.
Especially in the case of multiple root bridges, I wouldn't know *how*
to divide up the MMIO address space between them. (Dividing up the bus
range was hard enough.)

Same for IO ports -- all root bridges share the same [0xC000, 0xFFFF]
port range.

Therefore your idea above would immediately break OVMF, functionally :)

(I suspect that OVMF is kinda broken already, in this aspect, but it
happens to work.)

I know precious little about PCI resource allocation. This is one of the
areas we intend to look into seriously in the future, with Marcel
(CC'd). Thus, if porting OVMF to the generic PciHostBridgeDxe driver
requires incompatible changes at once, then I think we should postpone
the port until Marcel returns from his vacation.

To reiterate, my current problem is ultimately the lack of an algorithm
for dynamically dividing up the IO and MMIO space between root bridges.
Do you have an idea for that? (The algorithm, if there is one, could
very well be specific to QEMU -- which too is where we'll need Marcel's
input.)

Thanks!
Laszlo


Regards,
Ray


-----Original Message-----
From: Laszlo Ersek [mailto:[email protected]]
Sent: Tuesday, January 26, 2016 10:20 AM
To: Ni, Ruiyu <[email protected]>
Cc: [email protected]; Tian, Feng <[email protected]>; Fan,
Jeff <[email protected]>; Justen, Jordan L <[email protected]>
Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic
PciHostBridgeDxe driver.

Hi Ray,

I have a work-in-progress series to convert OvmfPkg to use this driver.
While testing it, I ran into a failed assertion:

On 01/13/16 09:00, Ruiyu Ni wrote:
This driver links to PciHostBridgeLib provided by platform/silicon to
produce PciRootBridgeIo and PciHostBridgeResourceAllocation protocol.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Ruiyu Ni <[email protected]>
Cc: Jeff Fan <[email protected]>
Cc: Feng Tian <[email protected]>
---
   .../Bus/Pci/PciHostBridgeDxe/PciHostBridge.c       | 1134
++++++++++++++
   .../Bus/Pci/PciHostBridgeDxe/PciHostBridge.h       |  252 ++++
   .../Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe.inf  |   55 +
   .../Bus/Pci/PciHostBridgeDxe/PciHostResource.h     |   47 +
   .../Bus/Pci/PciHostBridgeDxe/PciRootBridge.h       |  568 +++++++
   .../Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c     | 1561
++++++++++++++++++++
   MdeModulePkg/MdeModulePkg.dsc                      |    2 +
   7 files changed, 3619 insertions(+)
   create mode 100644
MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c
   create mode 100644
MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.h
   create mode 100644
MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe.inf
   create mode 100644
MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostResource.h
   create mode 100644
MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridge.h
   create mode 100644
MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c

diff --git a/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c
b/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c
new file mode 100644
index 0000000..08285d8
--- /dev/null
+++ b/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c
@@ -0,0 +1,1134 @@
+/** @file
+
+  Provides the basic interfaces to abstract a PCI Host Bridge
Resource Allocation.
+
+Copyright (c) 1999 - 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials
+are licensed and made available under the terms and conditions of
the BSD License
+which accompanies this distribution.  The full text of the license
may be found at
+http://opensource.org/licenses/bsd-license.php
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS
OR IMPLIED.
+
+**/
+
+#include "PciHostBridge.h"
+#include "PciRootBridge.h"
+#include "PciHostResource.h"
+
+
+EFI_METRONOME_ARCH_PROTOCOL *mMetronome;
+EFI_CPU_IO2_PROTOCOL        *mCpuIo;
+
+GLOBAL_REMOVE_IF_UNREFERENCED CHAR16 *mAcpiAddressSpaceTypeStr[] = {
+  L"Mem", L"I/O", L"Bus"
+};
+GLOBAL_REMOVE_IF_UNREFERENCED CHAR16 *mPciResourceTypeStr[] = {
+  L"I/O", L"Mem", L"PMem", L"Mem64", L"PMem64", L"Bus"
+};
+
+/**
+
+  Entry point of this driver.
+
+  @param ImageHandle  Image handle of this driver.
+  @param SystemTable  Pointer to standard EFI system table.
+
+  @retval EFI_SUCCESS       Succeed.
+  @retval EFI_DEVICE_ERROR  Fail to install PCI_ROOT_BRIDGE_IO
protocol.
+
+**/
+EFI_STATUS
+EFIAPI
+InitializePciHostBridge (
+  IN EFI_HANDLE         ImageHandle,
+  IN EFI_SYSTEM_TABLE   *SystemTable
+  )
+{
+  EFI_STATUS                  Status;
+  PCI_HOST_BRIDGE_INSTANCE    *HostBridge;
+  PCI_ROOT_BRIDGE_INSTANCE    *RootBridge;
+  PCI_ROOT_BRIDGE             *RootBridges;
+  UINTN                       RootBridgeCount;
+  UINTN                       Index;
+  PCI_ROOT_BRIDGE_APERTURE    *MemApertures[4];
+  UINTN                       MemApertureIndex;
+
+  RootBridges = PciHostBridgeGetRootBridges (&RootBridgeCount);
+  if ((RootBridges == NULL) || (RootBridgeCount == 0)) {
+    return EFI_UNSUPPORTED;
+  }
+
+  Status = gBS->LocateProtocol (&gEfiMetronomeArchProtocolGuid,
NULL, (VOID **) &mMetronome);
+  ASSERT_EFI_ERROR (Status);
+  Status = gBS->LocateProtocol (&gEfiCpuIo2ProtocolGuid, NULL,
(VOID **) &mCpuIo);
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Most systems in the world including complex servers have only
one Host Bridge.
+  //
+  HostBridge = AllocateZeroPool (sizeof (PCI_HOST_BRIDGE_INSTANCE));
+  ASSERT (HostBridge != NULL);
+
+  HostBridge->Signature        = PCI_HOST_BRIDGE_SIGNATURE;
+  HostBridge->CanRestarted     = TRUE;
+  InitializeListHead (&HostBridge->RootBridges);
+
+  HostBridge->ResAlloc.NotifyPhase          = NotifyPhase;
+  HostBridge->ResAlloc.GetNextRootBridge    = GetNextRootBridge;
+  HostBridge->ResAlloc.GetAllocAttributes   = GetAttributes;
+  HostBridge->ResAlloc.StartBusEnumeration  = StartBusEnumeration;
+  HostBridge->ResAlloc.SetBusNumbers        = SetBusNumbers;
+  HostBridge->ResAlloc.SubmitResources      = SubmitResources;
+  HostBridge->ResAlloc.GetProposedResources = GetProposedResources;
+  HostBridge->ResAlloc.PreprocessController = PreprocessController;
+
+  Status = gBS->InstallMultipleProtocolInterfaces (
+                  &HostBridge->Handle,
+                  &gEfiPciHostBridgeResourceAllocationProtocolGuid,
&HostBridge->ResAlloc,
+                  NULL
+                  );
+  if (EFI_ERROR (Status)) {
+    FreePool (HostBridge);
+    PciHostBridgeFreeRootBridges (RootBridges, RootBridgeCount);
+    return Status;
+  }
+
+  //
+  // Create Root Bridge Device Handle in this Host Bridge
+  //
+  for (Index = 0; Index < RootBridgeCount; Index++) {
+    //
+    // Create Root Bridge Handle Instance
+    //
+    RootBridge = CreateRootBridge (&RootBridges[Index],
HostBridge->Handle);
+    ASSERT (RootBridge != NULL);
+    if (RootBridge == NULL) {
+      continue;
+    }
+
+    if (RootBridges[Index].Io.Limit > RootBridges[Index].Io.Base) {
+      Status = gDS->AddIoSpace (
+                      EfiGcdIoTypeIo,
+                      RootBridges[Index].Io.Base,
+                      RootBridges[Index].Io.Limit -
RootBridges[Index].Io.Base + 1
+                      );
+      ASSERT_EFI_ERROR (Status);
+    }
This assertion fails here with with "Access Denied".

The reason is that we add IO and MMIO space still in PEI, as resource
descriptor HOBs, and then the DXE core primes the GCD memory space map
from them. Adding the same IO space here with the appropriate DXE
service runs into a conflict justifiedly, and it is rejected.

My question is if we could make the GCD memory space map manipulation
optional in this driver:

The PCI_ROOT_BRIDGE structure could be extended with a UINT32 flag. For
each of Io, Mem, MemAbove4G, PMem, PMemAbove4G, the PciHostBridgeLib
instance producing the PCI_ROOT_BRIDGE structure could set a separate
bit, indicating whether PciHostBridgeDxe should add (and set the
attributes of) the given aperture to the GCD memory space map.

Since this is a generic driver, I think it should accommodate the case
when the relevant apertures are described during PEI. (Not just because
that's what OVMF does, but because said practice is valid, as far as I
understand / recall the PI specs.)

In my (limited) experience it varies between PCI host bridge drivers
whether they install the apertures themselves, or rely on them already
being present.

In ArmVirtPkg/PciHostBridgeDxe/, the driver installs the (emulated) IO
and (real) MMIO space itself; while in OvmfPkg (and the original
PcAtChipsetPkg driver), this does not happen.

If you agree with the idea, I can try writing a patch for it.

Alternatively, you could require from the PciHostBridgeGetRootBridges()
API that on return, all the necessary GCD memory space map entries be in
place. At the moment, both approaches would work for me.

... Hm, I was too curious, and wrote the patch just now. OVMF (ported to
PciHostBridgeDxe) works well with it; I retested the multiple root
bridges case too, and compared the OVMF debug log & the Linux guest
dmesg before<->after. The resource assignments are the same. I also
regression tested a Windows 8.1 guest, with 6GB RAM.

So, what do you think of the attached patch? If you are okay with it, I
can submit it as part of my upcoming OVMF series.

Thanks!
Laszlo

[...]

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel


_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to