OK, attached is another dose of technical ideas for implementation of FreeMWare. This stuff is important, as it relates to the next wave of enhancements to our code. I'd appreciate your reactions to this stuff if you're into the techie side of FreeMWare. -Kevin
Hey, I'm thinking in terms of going the next step with FreeMWare and adding our own private page-table mappings, which we'll need as part of the overall virtualization strategy. To this end, there's some issues worth talking about which relate to the linear address space of the monitor and guest OS. BTW, currently I just use the host page tables etc and don't change CR3 (PDBR) upon switches to/from the host and monitor. Though, you'll notice there's a little code in there in preparation for paging stuff. MAPPING THE MONITOR'S GDT AND IDT INTO THE GUEST LINEAR SPACE ============================================================= The first issue is dealing with mapping the monitor's IDT and GDT into the linear address space used by both the monitor and guest OS. The monitor is never directly invoked by the guest code, it is only invoked via interrupts and exceptions which gate via the monitor's IDT. Entries in the IDT point into the GDT. So let's look at both, with respect to where to map them. As our IDT and GDT must occupy the same linear address domain as the guest code which is normally executing, we need to make sure there are mechanisms to allow the two to cohabitate. Another point worth noting is that the SGDT and SIDT instructions are not protected and thus ring3 (user) code may execute them. They each return a base address and limit, the base address being a pure linear address independent of the code and data segment base addresses. To offer really precise virtualization, in the sense that the user program will not detect us influencing the base linear address at which we store these structures, we could use the 2 following approaches. (Generally, our thought process so far is to use the pre-scanning technique to virtualize instructions at ring0. There may be some optimizations here specific to certain guest OSes. But for now, I'll leave it at that.) Approach #1: For user code, if we are performing the pre-scanning technique, we could simply virtualize the SGDT and SIDT instructions, and emulate them to return the values which the guest code expects. In this case we can place the GDT and IDT structure in linear memory such that they are in an area which is not currently used by either guest-OS or guest-user code. We do have access to the guest page tables, so it is fairly easy to find a free area. However, we would have to page protect the areas of memory which contain what the guest-OS thinks is the real GDT and IDT, and use the fault opportunity to update the real ones used by the monitor. Approach #2: If we wanted to let these 2 instructions execute without virtualized intervention, and still yield accurate results with respect to the base address returned, then we could actually place the GDT and IDT structures at their expected linear addresses. Since we need to page protect the GDT and IDT from access by the guest-OS code anyways so we can virtualize these structures, we might as well actually place them where the guest OS thinks they should be. Keep in mind that both the guest-OS and guest-app code will be pushed down to ring3, so they will generate a page fault upon trying to access the areas of memory containing the GDT and IDT, which we of course protected. This gives us a chance to do something smart with the access. MAPPING THE ACTUAL MONITOR INTERRUPT HANDLER CODE INTO THE GUEST LINEAR SPACE ====================================================== Now that we've discussed placing the GDT and IDT in linear memory, we need to map the actual interrupt handler code as well. Since we will be virtualizing the IDT and GDT, the guest OS will not see our segment descriptors and selectors, so we have some freedom here. We can place this code (by page mapping it) into an unused linear address range, again given we have access to the guest-OS page tables. The interrupt handler code, is actually just code linked with our host OS kernel module. The consideration here is that code generated by the compiler is based on offsets from the code and data segments. This code will not be calling functions in the host-OS kernel and should be contained to access within its own code and data when used in the monitor/guest context. So we must set the monitor's code and data segment base addresses such that the offsets make sense, based on the linear address where we map in the code. For example, let's say our host-OS uses a CS segment base normally of 0xc0000000 (like previous Linux kernels) and our kernel module lives in the range 0xc2000000 .. 0xc200ffff. Then let's say that based on empty areas in the guest-OS's page tables, we find a free range living at 0x62000000 .. 0x6200ffff. We would make the descriptor for our interrupt handler contain a base of 0x60000000, so that the offsets remain consistent with the kernel module code. And of course, we mark these pages as supervisor, so that in the case they are accesses by the guest OS, a fault will occur. We will also be virtualizing the guest-OS page tables, protecting that area of memory, so we can update our strategies. Thus, we will know when the guest-OS makes updates to it's page tables. This gives us a perfect opportunity to detect when an area of memory is no longer free. If the guest-OS marks a linear address range as not free anymore, and that conflicts with the range we are using for our monitor code, we can simply change the segment descriptor base addresses for code and data, and remap the handler code to another linear address range which is currently free. No memory transfers occur, only remapping of addresses. This kind of overhead will only occur once per time that we find we are no longer living in free memory. To reduce this even further, we could start out at, and use well known alternative addresses as part of our relocation strategy. The addresses we use, could be ones which are likely not to be used by particular guest OSes. -Kevin
