On 01/28/14 21:55, Bill Paul wrote:

> I think part of my problem is that I don't quite understand how the
> firmware  image volumes work either. You start out with one OVMF.fd
> image, which  contains all of the firmware in compressed form. I'm
> assuming this image is  mapped by QEMU into the address space such
> that there's some initial bootstrap  code placed at the reset vector
> so that the CPU hits it at power-up/reset, and  from there it extracts
> the contents into RAM.

Correct.

Regarding the on-disk format of the flash device, please see the commit
message

  https://github.com/tianocore/edk2/commit/b36f701d

(the "OVMF.fd after" part).

It is mapped just below 4GB (*) by qemu. See pc_system_firmware_init()
in file "hw/i386/pc_sysfw.c". We mostly care about
pc_system_flash_init() there.

(*) The size of OVMF.fd is normally 2MB for debug builds, and 1MB for
release builds. You can ask for the other size in both cases with -D
FD_SIZE_1MB and -D FD_SIZE_2MB. (See
<https://github.com/tianocore/edk2/commit/8184a764>.)

The reset vector code and the SEC code are uncompressed.

OVMF's reset vector is located in OvmfPkg/ResetVector. It reuses the
"generic" edk2 reset vector when SEC+PEI are 32-bit (Ia32). When SEC+PEI
are 64-bit (X64), then the reset vector sets up initial page tables too.
(We used to keep the prebuilt page tables too in read-only flash, but
KVM didn't really like to have them there, because it wanted to write
the Accessed bits in the page table entries, even if they were all
pre-set to 1. I can't recall the exact circumstances, but I believe it
was only a problem when nested paging was supported and enabled on the
host. See <https://github.com/tianocore/edk2/commit/c90e37b5>.)

The SEC code is entered at SecCoreStartupWithStack(), called from
"OvmfPkg/Sec/X64/SecEntry.S". The C code is in "SecMain.c".

It sets up some temporary stack and heap near SEC_TOP_OF_STACK,
decompresses the one FV FFS (= Firmware Volume / Firmware File System)
file to a temporary RAM buffer (starting at 9MB) from the flash (located
below 4GB).

It finds firmware volume headers in the decompressed output. One chunk
corresponds to PEIFV, and the other corresponds to DXEFV. These are then
copied to their final places.

Later on control is transferred to PEI. The last phase of PEI will key
off the S3 status (cold boot or resume). In the former case, it will
start DXE. In the latter case, it will jump to the OS's resume vector.

At S3 resume the reset vector and the SEC code run just the same from
the flash below 4GB. The SEC code will determine if we're cold booting
or resuming from S3 sleep. In the former case, see above. In the latter
case, we won't decompress anything. First, won't need DXE at all.
Second, we'll need PEI, but that's been decompressed before, and
protected from the OS as ACPI NVS, so we'll just jump to it.

> What I don't know is just where everything ends up in RAM.

Well in my versions of the patchset :), ie. up to v3, the series used to
start with a text file documenting the final RAM layout. Of course
that's completely obsolete now, so I'm not giving you any link lest it
confuse you.

We're between v4 and v5 now (an initial sequence from v4 has been
pushed, and AFAIK Jordan is about to post v5 of the rest). You *can*
glean the final layout from the FDF files (at the end of Jordan's
series), precisely from the spot where you've been looking anyway:

  https://github.com/jljusten/edk2/blob/ovmf-s3/OvmfPkg/OvmfPkgX64.fdf

> [FD.MEMFD]
> BaseAddress = 0x800000
> Size = 0x800000
> ErasePolarity = 1
> BlockSize = 0x10000
> NumBlocks = 0x80

So, we're basing it at 8MB, and the size is also 8MB. Within that range,
with relative start addresses:

>
> 0x000000|0x006000
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPageTablesBase|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPageTablesSize

These are the initial page tables built by the reset vector code,
identity-mapping the first 4GB. It comprises six 4K pages. The first two
pages host the page directories, the four other pages host the page
tables. The PTEs in there map 4GB with 2MB pages. (If I recall
correctly... 4GB/2MB == 2048 PTEs needed, 4*4KB=16384 bytes available
for PTEs, 16384/2048==8 bytes per PTE.)

So this is at 0x800000 + 0x000000 == 8MB.

> 0x006000|0x001000
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfLockBoxStorageBase|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfLockBoxStorageSize

This chunk (1 page) will be needed for internal purposes. Some data to
save across S3 sleep are prepared during DXE, before booting the OS.
Those data are separately allocated and saved in ACPI NVS regions (as
high as possible below the end of 32-bit RAM, ie. below the 32-bit PCI
hole), and they are linked into this small administrative range (which
hosts basically a linked list of pointers and sizes).

Range: 8MB+24KB to 8MB+28K.

> 0x010000|0x008000
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPeiTempRamBase|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPeiTempRamSize

This area hosts the initial (temporary) heap and stack for SEC and PEI
that I mentioned above. After PEI detects the size of available RAM
later on, it informs the PEI core about it ("installs permanent system
memory"), and then this heap and stack are dynamically relocated higher.

Range: 8MB+64KB to 8MB+96KB.

> 0x018000|0x008000
> gUefiOvmfPkgTokenSpaceGuid.PcdS3AcpiReservedMemoryBase|gEfiIntelFrameworkModulePkgTokenSpaceGuid.PcdS3AcpiReservedMemorySize

This range is not used for anything (other than reserving it from the
OS) during cold boot. During S3 resume, the temporary stack and heap are
*not* migrated to some dynamic place in the full system memory (because
that's already used by the OS). Instead, the "permanent" PEI stack and
heap are relocated to this region (which has been kept away from the
OS).

Range: 8MB+96KB to 8MB+128KB.

> 0x020000|0x0E0000
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPeiMemFvBase|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPeiMemFvSize
> FV = PEIFV

This region hosts the PEI modules (after decompression), ie. it's the
final place for PEIFV.

Range: 8MB+128KB to 9MB.

> 0x100000|0x700000
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfDxeMemFvBase|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfDxeMemFvSize
> FV = DXEFV

This region hosts the DXE modules (after decompression), ie. it's the
final place for DXEFV.

Range: 9MB to 16MB.

For S3 purposes, we must reserve all of these as ACPI NVS, except the
last one (ie. DXE modules), because DXE is not run/reached during S3
resume.

> You have sections marked BS_Code, BS_Date, RT_Code, RT_Data and
> LoaderCode. Is LoaderCode the guts of the firmware?

Hmmm I don't think so. Type "EfiLoaderCode" normally stands for "The
code portions of a loaded application. (Note that UEFI OS loaders are
UEFI applications.)" -- see Table 25 in the UEFI spec.

For example, the "grub2-efi" binary qualifies.

The OS can release/repurpose ranges of this type (see table 26).

> Are you saying the PEIFV area contains yet more guts?

Internally, yes (it contains a bunch of PEI drivers), but the OS doesn't
need to know.

Same for the DXEFV range.

> And that at the time you have to decide where to put it, you don't
> know how  much RAM is available yet and/or the code isn't relocatable?

Correct. When we decompress the "nameless" FV FFS file in SEC, and copy
PEIFV and DXEFV to their "final" places from the decompressed output, we
don't yet know how much RAM is available. We only determine that in one
of the PEI modules (OvmfPkg/PlatformPei/), which is code located inside
PEIFV.

(At which point we (will) also install the "permanent PEI memory",
triggering the temporary-to-permanent stack/heap migration.)

In theory, we could perhaps fetch the amount of RAM from the CMOS in SEC
too, and use a 8MB range somewhere below the PCI hole rather than at
fixed 8MB..16MB.

We certainly need Jordan to chime in here. The base address @ 8MB dates
back to a time when I wasn't around yet. Moving it to the other end of
guest RAM could regress stuff that I'm not aware of.

>> How large a contiguous range would you need from 1MB upwards?
>> (Because the address that we'd shift this up to would likely directly
>> impact the minimum qemu guest memory requirements.)
>
> Unfortunately I'm not sure I have a good answer to that question. We
> typically  load the VxWorks image at 0x408000, and I think out of the
> box the 32-bit  build needs about 300MB. (Yes, I know: that doesn't
> sound very embedded, does  it.)

Ouch!

> But I don't think this is the right way to approach the issue either.
> Something tells me there's a better way to do what you're trying to
> do, but I  don't understand enough about the problem yet to offer an
> alternate solution.

I can't of course *prove* that what OVMF does is the best way, but I'll
note that you load the VxWorks kernel at a fixed address, with a fixed
size requirement (same as we do in OVMF, basically), even though the
VxWorks kernel is higher up on the abstraction ladder.

I don't think we should even ask the question "who's right" here.

For example, I sometimes glance at #linaro-enterprise on FreeNode. The
Aarch64 Linux kernel being discussed there seems to put other
(different) address restrictions on the UEFI firmware that loads it
(<http://irclogs.linaro.org/2014/01/28/%23linaro-enterprise.html>).

This suggests that firmware, OS boot loader, and OS should find some
understanding, and that this understanding will be arbitrary (because it
can't be really justified by anything else than "well this is how our OS
works").

I assume that you boot, from under OVMF, a VxWorks-specific boot loader
(which is a UEFI application), which in turn pulls in the 300MB kernel
image at 0x408000. Is that correct?

Maybe the boot loader could "simply" call gBS->AllocatePages()  with the
appropriate address hints instead.

Or, if loading occurs after ExitBootServices(), then the initial runtime
code could iterate over the UEFI memory map, and find a sufficiently
large contiguous range that consists of EfiConventionalMemory only (plus
whatever types Table 26 allows to be freed), and load the kernel there.

>
>>> It is possible to tweak things in VxWorks to avoid this problem, but
>>> it's a pain. It's also not something we typically encounter on real
>>> hardware.
>>
>> I don't think we'd like to hard-wire a *very* different base address
>> statically. Maybe we could add a build option, but that only moves
>> the pain around.
>>
>> Re it being different from real hardware, the explanation is that
>> most of OVMF's modules are stored compressed in the flash, and are
>> decompressed to (and then run from) RAM at startup. I assume on real
>> hardware the firmware simply runs from flash. (Hm, I guess it could
>> be shadowed into RAM too, but I have no data about what addresses.)
>
> I think it equally likely that you'd have compressed flash images on
> real hardware too. (We actually offer a romCompressed option with
> VxWorks, where there's no firmware on the system: there's just
> VxWorks, and it disgorges itself into RAM to execute. There is also a
> romResident option if you have enough flash/ROM to hold the whole
> image and don't mind the performance hit.)
>
> But if it's a question of just having the executable code still around
> somewhere and you can't manage that with compressed images,

(we can)

> why not create an uncompressed build option too? Yes I know it would
> take up some more address space, but that may be the only way to make
> it work.

If we kept the PEI and DXE modules uncompressed in flash, then:
- we could indeed execute them directly from below 4GB, probably,
- but we'd still need to reserve other areas,
- and the flash size would grow significantly. In the past, concerns
  were raised on the mailing list about raising the default flash size
  from 1MB to 2MB (I wasn't (and am not) aware why), but I do think such
  a jump in size would be concerning again.

>>>> Additionally, after the full S3 support series committed, further
>>>> code will be added to honor the case when the user disables S3 on
>>>> the qemu command line ("-global PIIX4_PM.disable_s3=1"). Then the
>>>> memory allocation in question will be qualified as Boot Services
>>>> Data (rather than ACPI NVS), and the OS will be able to drop it
>>>> after transitioning to runtime.
>>>
>>> It appears I need a newer version of QEMU for that option:
>>>
>>> root@core:/home/wpaul/ovmf # qemu-system-x86_64 -global
>>> PIIX4_PM.disable_s3=1 qemu-system-x86_64: Property '.disable_s3' not
>>> found
>>
>> Correct. This property was added in
>>
>> commit 459ae5ea5ad682c2b3220beb244d4102c1a4e332
>> Author: Gleb Natapov <g...@redhat.com>
>> Date:   Mon Jun 4 14:31:55 2012 +0300
>>
>>     Add PIIX4 properties to control PM system states.
>>
>> first released in v1.2.0.
>>
>> I searched the FreeBSD ports repo for qemu, and it seems that the
>> "qemu-devel" package is at 1.7.0. (Not sure if you can easily get it
>> in 9.1-RELEASE.)
>
> I'm sure I can shoehorn it in somehow. :)

Please note though that OVMF code to actually honor this setting will
only be written/added once the "basic" S3 functionality is complete.
(Which it is not, for the time being.)

Naturally, you can try to convince Jordan to implement that ASAP :)

(My v3 contained those patches at the end of the series. Jordan has
taken over for v4 and v5, among other things changing the memory map
significantly (for the better, I have no problems admitting that), but
we've also split off / postponed honoring the disable_s3 property for a
future, separate series.)

>
>>> That aside, this would be an acceptable compromise, at least until
>>> VxWorks supports S3 resume on the Intel architecture. :)
>>>
>>> I still think the placement of the PEIFV block is much less than
>>> ideal, but for the time being I can deal with it.
>>
>> Alternatively, please propose the lowest address that would work out
>> of the box for your use case, and then Jordan could decide if it was
>> reasonable to re-wire the FDFs with that address.
>
> I don't think assuming 300MB of RAM is reasonable

Fully agreed :)

> so I don't think that will work. Maybe once I've read more of the code
> I can suggest a better idea.

The in-tree code that fetches the amount of RAM from CMOS is in
"OvmfPkg/PlatformPei/MemDetect.c", and as I said it runs in PEI.

The decompression code is in "OvmfPkg/Sec/SecMain.c". It runs in SEC,
before PEI.

I think any proposed changes should be synchronized with Jordan's S3
series (v5 is coming soon; see the "ovmf-s3" branch reference above),
because I believe it's going to rework some of the MemDetect /
reservation bits.

Thanks,
Laszlo

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to