The other day, Andi discovered an interesting startup performance issue on
Mac OS, which he reported as bug 4579:
http://bugzilla.osafoundation.org/show_bug.cgi?id=4579
The simplest way to reproduce the behavior he observed is to add a line
like this:
print "end of Block.py"
to the end of the osaf.framework.blocks.Block module, and then add:
print "after import of Block"
to osaf.framework.scripting, after the import of the Block module, like so:
import osaf.framework.blocks.Block as Block
print "back from Block" # <-- add me
import osaf.framework.blocks.ControlBlocks as ControlBlocks
If you then do:
RunPython -c "import osaf.framework.scripting"
you will see a brief delay, followed by "end of Block.py", another delay,
"Back from Block", and another delay before the command prompt returns.
On my PC, when Andi first reported this issue, I timed the total execution
time as being about 4 seconds, with a delay between the two prints as being
a little over 1 second. So, the delay is not unique to the Mac.
Following today's performance discussion at the services/dev/repository
group meeting, I did some research into Windows' handling of DLL loading,
and found out a few interesting things.
First, every .dll (and .pyd) has a "preferred address" for loading. A DLL
or PYD that is loaded at its preferred address will be lazily loaded or
"demand paged". This means that Windows will not load any part of the DLL
into memory until it is actually needed. Also, it will not require any
"fixups" to change addresses of calls within the DLL's code.
However, if two DLL or PYD files have preferred address ranges that
overlap, one of them must be relocated at runtime. The relocated DLL is
not "demand paged", however. Instead, the whole DLL is loaded immediately,
even if Chandler does not yet need all of the DLL's code. In addition,
Windows has to "fix up" addresses in the DLL during this process. So, the
performance cost in both disk I/O and CPU use for loading a relocated DLL
is proportional to the size of the library.
As it happens, some of the largest DLLs we use have default addresses that
overlap those of other libraries we use, which means that we can end up
paying heavy relocation costs at Chandler start time.
Luckily, there is a simple way to relocate most DLLs preferred addresses,
by modifying the DLL on-disk. The Windows API includes a function that can
be invoked to relocate a DLL file to a specified address, and I found a
short Python script that can use it to "rebase" all of the DLLs and PYDs in
a given directory tree:
http://mail.python.org/pipermail/spambayes-dev/2004-July/002942.html
I modified this script, replacing the 'if __name__=="__main__"' block with
the following line:
rebase(True, 0x70000000, *get_images("release\\bin"))
This reset the preferred addresses of all our DLLs and PYDs so that they no
longer overlapped. Only three files were not relocated: python24.dll,
icudt32.dll, and icudt34.dll. I'm not sure why the python24.dll file can't
be relocated, but all the other libraries are moved out of its way, so it
probably doesn't matter much. The other two DLLs appear to be just
wrappers for data, and so do not seem to really have a "preferred
address". Hopefully this means that they are demand-paged regardless of
location.
After performing the rebasing, I then reran my earlier test. The delay
between the two prints dropped to roughly half a second, and the total test
time dropped to about 3 seconds. This strongly suggests that both the
"hang" of bug 4579 and total startup time are significantly affected by
collisions in DLLs' preferred addresses, and that it's worth investigating
whether the use of a rebasing tool can be incorporated as either a build
step or post-installation step.
Since the rebasing operation can be performed by Python code (as long as
the targeted DLLs are not actually in use), this means we can potentially
apply the optimization to third-party plugins as well as our own
code. Indeed, I'm thinking it might be handy to have a way for the Python
Eggs runtime to perform rebasing automatically under certain circumstances,
although that won't happen for some time if at all.
At this point, the rebasing script is not suitable for production use, in
that it assumes a particular start address, a particular directory, and it
also allocates new addresses by moving up in memory, even though the
Microsoft-recommended direction for DLL (as opposed to EXE) allocation is
downward in memory. I will probably do some more research on this issue to
determine whether the script should be changed to use this other approach.
But, the performance improvement definitely suggests that a more formal
investigation would be worthwhile. For example, running the script to
rebase the code on the Windows performance tinderbox, and then measuring
the change in startup performance. It also suggests that we should
investigate whether Mac OS has similar issues with base address collisions
when dynamic linking, since the PC pauses at a similar point during
startup, and rebasing reduces that pause.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev