Hi, For quite a long time I have been bothered by the very large files needed for python extensions. In particular for numpy.core, which consists in a few files which are ~ 1 Mb, I find this a pretty high barrier of entry for newcomers, and it has quite a big impact on the code organization. I think I have found a way to split things on common platforms (this includes at least windows, mac os x, linux and solaris), without impacting other potentially less capable platforms, or static linking of numpy.
Assuming my idea is technically sound and that I can demonstrate it works on say Linux without impacting other platforms (see example below), would that be considered useful ? cheers, David Technical details ================== The rationale for doing things as they are is a C limitation related to symbol visibility being limited to file scope, i.e. if you want to share a function into several files without making it public in the binary, you have to tag the function static, and include all .c files which use this function into one giant .c file. That's how we do it in numpy. Many binary format (elf, coff and Mach-O) have a mechanism to limit the symbol visibility, so that we can explicitly set the functions we do want to export. With a couple of defines, we could either include every files and tag the implementation functions as static, or link every file together and limit symbol visibility with some linker magic. Example ------- I use the spam example from the official python doc, with one function PySpam_System which is exported in a C API, and the actual implementation is _pyspam_system. * spammodule.c: define the interface available from python interpreter: #include <Python.h> #include <stdio.h> #define SPAM_MODULE #include "spammodule.h" #include "spammodule_imp.h" /* if we don't know how to deal with symbol visibility on the platform, just include everything in one file */ #ifdef SYMBOL_SCRIPT_UNSUPPORTED #include "spammodule_imp.c" #endif /* C API for spam module */ static int PySpam_System(const char *command) { _pyspam_implementation(command); return 0; } * spammodule_imp.h: declares the implementation, should only be included by spammodule.c and spammodule_imp.c which implements the actual function #ifndef _IMP_H_ #define _IMP_H_ #ifndef SPAM_MODULE #error this should not be included unless you really know what you are doing #endif #ifdef SYMBOL_SCRIPT_UNSUPPORTED #define SPAM_PRIVATE static #else #define SPAM_PRIVATE #endif SPAM_PRIVATE int _pyspam_implementation(const char *command); #endif For supported platforms (where SYMBOL_SCRIPT_UNSUPPORTED is not defined), _pyspam_implementation would not be visible because we would have a list of functions to export (only initspam in this case). Advantages ---------- This has several advantages on platforms where this is supported - code more amenable: source code which are thousand of lines are difficult to follow - faster compilation times: in my experience, compilation time doesn't scale linearly with the amount of code. - compilation can be better parallelized - changing one file does not force a whole multiarray/ufunc module recompilation (which can be pretty long when you chase bugs in it) Another advantage is related to namespace pollution. Since library extensions are static libraries for now, any symbol frome those libraries used by any extension is publicly available. For example, now that multiarray.so uses the npy_math library, every symbol in npy_math is in the public namespace. That's also true for every scipy extensions (for example, _fftpack.so exports the whole dfftpack public API). If we want to go further down the road of making core computational code publicly available, I think we should improve this first. Disadvantage ------------ We need to code it. There are two parts: - numpy.distutils support: I have already something working in for linux. Once we have one platform working, adding others should not be a problem - changing the C code: we could at first splitting things in .c files but still including everything, and then starting the conversion. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion