The other day, Andi discovered an interesting startup performance issue on Mac OS, which he reported as bug 4579:

    http://bugzilla.osafoundation.org/show_bug.cgi?id=4579

The simplest way to reproduce the behavior he observed is to add a line like this:

    print "end of Block.py"

to the end of the osaf.framework.blocks.Block module, and then add:

    print "after import of Block"

to osaf.framework.scripting, after the import of the Block module, like so:

    import osaf.framework.blocks.Block as Block
    print "back from Block"   # <-- add me
    import osaf.framework.blocks.ControlBlocks as ControlBlocks

If you then do:

    RunPython -c "import osaf.framework.scripting"

you will see a brief delay, followed by "end of Block.py", another delay, "Back from Block", and another delay before the command prompt returns.

On my PC, when Andi first reported this issue, I timed the total execution time as being about 4 seconds, with a delay between the two prints as being a little over 1 second. So, the delay is not unique to the Mac.

Following today's performance discussion at the services/dev/repository group meeting, I did some research into Windows' handling of DLL loading, and found out a few interesting things.

First, every .dll (and .pyd) has a "preferred address" for loading. A DLL or PYD that is loaded at its preferred address will be lazily loaded or "demand paged". This means that Windows will not load any part of the DLL into memory until it is actually needed. Also, it will not require any "fixups" to change addresses of calls within the DLL's code.

However, if two DLL or PYD files have preferred address ranges that overlap, one of them must be relocated at runtime. The relocated DLL is not "demand paged", however. Instead, the whole DLL is loaded immediately, even if Chandler does not yet need all of the DLL's code. In addition, Windows has to "fix up" addresses in the DLL during this process. So, the performance cost in both disk I/O and CPU use for loading a relocated DLL is proportional to the size of the library.

As it happens, some of the largest DLLs we use have default addresses that overlap those of other libraries we use, which means that we can end up paying heavy relocation costs at Chandler start time.

Luckily, there is a simple way to relocate most DLLs preferred addresses, by modifying the DLL on-disk. The Windows API includes a function that can be invoked to relocate a DLL file to a specified address, and I found a short Python script that can use it to "rebase" all of the DLLs and PYDs in a given directory tree:

   http://mail.python.org/pipermail/spambayes-dev/2004-July/002942.html

I modified this script, replacing the 'if __name__=="__main__"' block with the following line:

   rebase(True, 0x70000000, *get_images("release\\bin"))

This reset the preferred addresses of all our DLLs and PYDs so that they no longer overlapped. Only three files were not relocated: python24.dll, icudt32.dll, and icudt34.dll. I'm not sure why the python24.dll file can't be relocated, but all the other libraries are moved out of its way, so it probably doesn't matter much. The other two DLLs appear to be just wrappers for data, and so do not seem to really have a "preferred address". Hopefully this means that they are demand-paged regardless of location.

After performing the rebasing, I then reran my earlier test. The delay between the two prints dropped to roughly half a second, and the total test time dropped to about 3 seconds. This strongly suggests that both the "hang" of bug 4579 and total startup time are significantly affected by collisions in DLLs' preferred addresses, and that it's worth investigating whether the use of a rebasing tool can be incorporated as either a build step or post-installation step.

Since the rebasing operation can be performed by Python code (as long as the targeted DLLs are not actually in use), this means we can potentially apply the optimization to third-party plugins as well as our own code. Indeed, I'm thinking it might be handy to have a way for the Python Eggs runtime to perform rebasing automatically under certain circumstances, although that won't happen for some time if at all.

At this point, the rebasing script is not suitable for production use, in that it assumes a particular start address, a particular directory, and it also allocates new addresses by moving up in memory, even though the Microsoft-recommended direction for DLL (as opposed to EXE) allocation is downward in memory. I will probably do some more research on this issue to determine whether the script should be changed to use this other approach.

But, the performance improvement definitely suggests that a more formal investigation would be worthwhile. For example, running the script to rebase the code on the Windows performance tinderbox, and then measuring the change in startup performance. It also suggests that we should investigate whether Mac OS has similar issues with base address collisions when dynamic linking, since the PC pauses at a similar point during startup, and rebasing reduces that pause.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

Reply via email to