Hi,

I am sorry for the noise, I am re-sending this mail from an e-mail address 
subscribed to the list.

Thanks,
Marcel

On 02/08/2016 12:41 PM, Marcel Apfelbaum wrote:
On 02/06/2016 09:09 AM, Ni, Ruiyu wrote:
Marcel,
Please see my reply embedded below.

On 2016-02-02 19:07, Laszlo Ersek wrote:
On 02/01/16 16:07, Marcel Apfelbaum wrote:
On 01/26/2016 07:17 AM, Ni, Ruiyu wrote:
Laszlo,
I now understand your problem.
Can you tell me why OVMF needs multiple root bridges support?
My understanding to OVMF is it's a firmware which can be used in a
guest VM
environment to boot OS.
Multiple root bridges requirement currently mainly comes from high-end
servers.
Do you mean that the VM guest needs to be like a high-end server?
This may help me to think about the possible solution to your problem.
Hi Ray,

Laszlo's explanation is very good, this is not exactly about high-end VMs,
we need the extra root bridges to match assigned devices to their
corresponding NUMA node.

Regarding the OVMF issue, the main problem is that the extra root
bridges are created dynamically
for the VMs (command line parameter) and their resources are computed on
the fly.

Not directly related to the above, the optimal way to allocate resources
for PCI root bridges
sharing the same PCI domain is to sort devices MEM/IO ranges from the
biggest to smallest
and use this order during allocation.

After the resources allocation is finished we can build the CRS for each
PCI root bridge
and pass it back to firmware/OS.

While for "real" machines we can hard-code the root bridge resources in
some ROM and have it
extracted early in the boot process, for the VM world this would not be
possible. Also
any effort to divide the resources range before the resource allocation
would be odd and far from optimal.

Hi Ray,
Thank you for your response,

Real machine uses hard-code resources for root bridges. But when the resource
cannot meet certain root bridges' requirement, firmware can save the real 
resource
requirement per root bridges to NV storage and divide the resources to each root
bridge in next boot according to the NV settings.
The MMIO/IO routine in the real machine I mentioned above needs to be fixed
in a very earlier phase before the PciHostBridgeDxe driver runs. That's to say 
if
[2G, 2.8G) is configured to route to root bridge #1, only [2G, 2.8G) is allowed 
to
assigned to root bride #1.  And the routine cannot be changed unless a platform
reset is performed.

I understand.


Based on your description, it sounds like all the root bridges in OVMF share the
same range of resource and any MMIO/IO in the range can be route to any root
bridge. For example, every root bridge can use [2G, 3G) MMIO.

Exactly. This is true for "snooping" host-bridges which do not have their own
configuration registers (or MMConfig region). They are sniffing host-bridge 0
for configuration cycles and if the are meant for a device on a bus number
owned by them, they will forward the transaction to their primary root bus.

Until in
allocation phase, root bridge #1 is assigned to [2G, 2.8G), #2 is assigned
to [2.8G, 2.9G), #3 is assigned to [2.9G, 3G).

Correct, but the regions do not have to be disjoint in the above scenario.
root bridge #1 can have [2G,2.4G) and [2.8,3G) while root bridge #1 can have 
[2.4,2.8).

This is so the firmware can distribute the resources in an optimal way. An 
example can be:
   - root bridge #1 has a PCI device A with a huge BAR and a PCI device B with 
a little BAR.
   - root bridge #2 has  aPCI device C with a medium BAR.
The best way to distribute resources over [2G, 3G) is A BAR, C BAR, and only 
then B BAR.

So it seems that we need a way to tell PciHostBridgeDxe driver from the 
PciHostBridgeLib
that all resources are sharable among all root bridges.

This is exactly what we need, indeed.


The real platform case is the allocation per root bridge and OVMF case is the 
allocation
per PCI domain.

Indeed, bare metal servers use different PCI domain per host bridge, but I've 
actually seen
real servers that have multiple root bridges sharing the same PCI domain, 0.


Is my understanding correct?

It is, and thank you for taking your time to understand the issue,
Marcel


[...]

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to